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Abstract 

An  automatic  sunspot  detection  and  classification  method  is  developed  combin¬ 
ing  HMII  and  HMIM  imagery  procured  from  the  Solar  Dynamics  Observatory.  It¬ 
erative  global  thresholding  methods  are  employed  for  detecting  sunspots.  Groups 
are  selected  based  on  heliographic  distance  between  sunspots  via  area-based  group¬ 
ing  lengths.  Classifications  are  applied  through  logical  operators  adhering  to  the 
standard  McIntosh  classification  system.  Calculated  sunspot  parameters  and  clas¬ 
sifications  are  validated  in  three  way  comparisons  between  code  output,  Holloman 
AFB  and  the  Space  Weather  Prediction  Center.  Accuracy  is  achieved  within  the 
margin  of  difference  between  Holloman  and  SWPC  reports  for  sunspot  area,  num¬ 
ber  of  groups,  number  of  spots,  and  McIntosh  classification  using  data  spanning  6 
July  2012  to  29  June  2013:  SWPC/Holloman  (33.38%, 57.48%, 87.67%),  SWPC/SDO 
(20.22%, 51.25%, 83.80%),  and  SDO/Holloman  (24.54%, 50.91%, 80.65%).  The  auto¬ 
matic  classification  system  is  used  to  evaluate  bias  inherent  in  Holloman  classification 
methods.  Parameters  are  altered  to  reach  optimal  match  percentages  with  Holloman, 
indicating  differences  between  computed  parameter  values  and  hand-calculated  coun¬ 
terparts.  Group  length  cutoffs  are  shown  to  differ  by  2.5°,  eccentricity  is  quantified 
at  0.8,  and  penumbra  length  cutoffs  are  shown  to  exceed  differences  of  1.4°  from 
McIntosh  values. 
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FULLY  AUTOMATED  SUNSPOT  DETECTION  AND  CLASSIFICATION 
USING  SDO  HMI  IMAGERY  IN  MATLAB 

I.  Introduction 

1.1  Space  Weather  Operations 

United  States  Air  Force  (USAF)  instructions  currently  dictate  that  weather 
personnel  at  three  optical  solar  observatories  around  the  world  complete  sunspot 
drawings  by  hand  once  a  day  and  transmit  the  results  to  the  Space  Weather 
Prediction  Center  (SWPC).  These  sunspot  drawings  are  performed  by  solar  analysts 
at  three  geographically  separated  bases:  Holloman  Air  Force  Base  (AFB)  in  New 
Mexico,  Learnronth  Solar  Observatory  in  Australia,  and  San  Vito  Solar  Observatory 
in  Italy.  Drawings  are  completed  on  a  sheet  of  paper,  copying  a  projected  image  of 
the  sun  from  a  solar  telescope  at  optimal  times  throughout  the  day,  generally  when 
visibility  is  best.  Because  seeing  conditions  change  throughout  the  day  and 
atmospheric  conditions  can  cause  difficulties  with  observing  quality,  this  method  can 
fail  to  produce  accurate  results  or  produce  biased  results  under  a  number  of 
different  circumstances.  Each  solar  analyst  may  trace  spots  differently,  and  the 
technique  by  which  sunspots  are  grouped  is  not  uniform  across  analysts,  especially 
for  sunspots  found  in  regions  near  the  outer  limb  of  the  Sun.  Geometric 
foreshortening  effects  may  also  lead  to  poor  area  approximations  and  misjudgment 
of  the  size  or  completeness  of  penumbra  surrounding  each  umbra. 

Once  sunspot  analysis  at  each  base  is  complete,  drawings  are  collected  by 
SWPC  and  subsequently  used  to  determine  the  location  and  classification  of 
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sunspots  and  a  flare  probability  for  the  next  24  hours.  When  the  weather  does  not 
allow  for  a  drawing  to  be  performed,  no  report  is  submitted  to  SWPC.  While  this 
method  can  be  effective,  sunspot  evolution  can  often  times  be  more  dynamic  and 
change  on  time  scales  shorter  than  24  hours. 

Sunspot  location  and  classification  have  been  correlated  with  solar  flares  and 
Coronal  Mass  Ejections  (CMEs)  [Bornmann  and  Shaw,  1994],  This  correlation  can 
be  used  to  forecast  the  probability  of  solar  flares  for  any  given  day  which  can  impact 
military  and  civilian  operations.  While  forehand  knowledge  of  a  flare  or  CME  does 
not  eliminate  the  effects  of  the  storm,  it  can  provide  useful  information  for  finding 
the  degree  of  influence  each  solar  storm  can  have  on  earth.  In  any  case,  prior 
knowledge  is  valuable  to  the  Air  Force  in  order  to  create  a  better  system  for 
analyzing  and  predicting  solar  weather  to  a  point  that  saves  time  and  money, 
subsequently  increasing  Air  Force  effectiveness  in  global  reach  and  power  projection. 

1.2  Objectives 

Flare  statistics  are  calculated  by  dividing  the  number  of  times  a  flare  was  seen 
from  a  group  with  a  specific  classification  by  the  total  number  of  times  that 
classification  has  been  seen.  For  example,  if  a  flare  results  every  time  a  classification 
is  seen,  that  category  of  sunspot  will  be  given  a  flare  probability  of  1.  However, 
pairing  the  fact  that  sunspot  evolution  is  dynamic  with  the  fact  that  sunspot 
classifications  may  only  be  updated  once  a  day,  it  can  be  seen  that  there  exists  a 
potential  to  wrongfully  attribute  flares  to  classes  that  have  little  connection  to  solar 
flares.  In  addition,  because  of  human  bias,  incorrectly  classified  sunspot  groups  may 
wrongfully  acqiiire  flare  counts.  Therefore,  a  better  understanding  of  the  time 
evolution  of  sunspot  groups  and  a  consistent,  unbiased  classification  method  may 
help  improve  forecasting.  Two  ways  to  implement  this  improvement  include 
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increasing  the  speed  of  detection  and  classification  of  sunspots  as  well  as  eliminating 
human  bias  through  automation  of  the  process.  While  past  attempts  at  this  task 
have  yielded  accurate  results,  summarized  by  Aschwanden’s  review  of  automated 
solar  analysis  tools  [ Aschwanden ,  2010],  implementation  of  the  automation  process 
using  images  from  the  Solar  Dynamics  Observatory  (SDO)  has  only  recently  been 
adopted  [  Watson ,  2012],  SDO  images  can  significantly  improve  the  process  clue  to 
greater  resolution  images  and  the  absence  of  terrestrial  atmosphere  between  the 
source  and  observation  point.  Successful  development  of  this  process  has  the 
potential  to  provide  an  updated  classification  to  AF  weather  personnel  every  minute 
and  will  give  large  amounts  of  new  information  on  sunspot  evolution  that  may 
improve  flare  prediction  in  the  future.  The  purpose  of  this  research  is  to  provide  a 
better  algorithm  for  sunspot  detection  and  classification  based  in  Matrix  Laboratory 
(MATLAB)  that  can  be  useful  for  AF  implementation  aiding  solar  analysts  in  their 
classification  duties. 

1.3  Previous  Research 

Because  of  the  cost  and  practicality  associated  with  physically  measuring 
aspects  of  interplanetary  space  and  the  solar  atmosphere,  modeling  based  on  solar 
observations  is  the  primary  method  for  researching  the  Sun.  A  robust  formulation  of 
sunspot  classification  was  put  to  use  for  thousands  of  previous  observations  made 
over  the  course  of  nearly  a  decade  from  1969-1976  [ McIntosh ,  1990].  This  process 
greatly  improved  the  previously  used  Zurich  classification  by  modifying  the  general 
groups  outlined  by  Kiepenheuer  [ Kiepenheuer ,  1953],  and  supplementing  with 
additional  elements  to  differentiate  between  sunspot  groups.  Automation  of  the 
detection  process  has  been  in  the  works  since  the  late  1990s  [ Al- Omari  et  al,  2009; 
Benkhalil  et  al.,  2004,  2003;  Colak  and  Qahwaji ,  2008;  Curto  et  al,  2008;  De  Wit, 
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2006;  Delouille  et  al,  2012;  Nguyen  et  al,  2005,  2006;  Park,  2011;  Qahwaji  and 
Colak,  2005,  2006,  2007;  Turmon  et  al,  1998,  2002;  Verbeeck  et  al,  2013;  Watson 
and  Fletcher,  2010;  Watson,  2012;  Zharkov  et  al,  2005],  but  a  widely  successful 
algorithm  for  both  detection  and  classification  has  yet  to  be  developed.  Accurate 
methods  for  active  region  detection  exist,  but  systems  for  successfully  assigning  a 
classification  in  a  wide  variety  of  circumstances  have  generally  dwindled  around  a 
50-70%  [Benkhalil  et  al,  2004;  Colak  and  Qahwaji,  2008;  Jewalikar  and  Singh ; 
Nguyen  et  al,  2006;  Park,  2011;  Watson,  2012;  Turmon  et  al,  1998]. 

Many  previous  methods  focus  on  a  specific  type  of  image  processing,  relying 
purely  on  a  variation  of  mathematical  morphology  or  simple  logical  operations 
[ Gonzalez  and  Eddins,  2009].  In  addition,  some  methods  have  attempted  to 
incorporate  magnetogram  images  [ Colak  and  Qahwaji,  2008]  into  the  classification 
section  to  better  determine  spot  polarity.  Previous  iterations  of  this  research  have 
combined  other  resources  to  come  up  with  automatic  detection  and  classification, 
but  only  one  group  has  looked  at  using  SDO  images  for  classification  (most  others 
use  ground  based  telescopes  or  imagery  from  the  Solar  and  Heliospheric 
Observatory  (SOHO)  satellite).  The  main  benefit  with  SDO  imagery  will  be  a 
minimum  of  four  times  improvement  on  spatial  resolution.  Additionally,  the  SDO  is 
located  in  an  inclined  geosynchronous  orbit  meaning  images  it  produces  do  not 
require  corrections  from  random  atmospheric  disturbances  as  terrestrial  satellites 
would  [Pesnell  et  al,  2012],  The  method  pursued  by  [  Watson ,  2012]  was  different  in 
that  the  goal  for  each  processed  image  was  to  provide  a  time  evolution  step  for  a 
sunspot  tracking  algorithm.  In  addition  to  various  other  grouping  methods,  the  use 
of  neural  networks  has  been  pursued  [Colak  and  Qahwaji,  2008;  De  Wit,  2006; 

S ocas- Navarro,  2005]  for  implementation  in  the  classification  process.  The  use  of  an 
adaptive  classification  process  based  on  previously  made  classifications  is,  by 
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definition,  biased  and  therefore,  the  use  of  neural  networks  (beyond  the  simple 
forward-propagating  network  model  of  input  to  output)  is  neglected  in  this  research. 

1.4  Document  Outline 

The  following  chapters  of  this  thesis  are  organized  into  background 
information,  research  methodology,  results  analysis,  and  conclusions.  Background 
information  presented  in  Chapter  II  includes  a  discussion  of  the  Sun  as  a  whole,  and 
introduces  specific  terms  and  concepts  used  throughout  the  document.  Chapter  III 
outlines  the  development  of  the  automated  code  and  discusses  some  of  the  features 
associated  with  the  specific  route  of  image  processing  that  was  taken.  An  analysis  of 
the  data  produced  by  the  automated  code  is  presented  in  Chapter  IV.  Finally, 
Chapter  V  summarizes  the  results  of  this  research  and  presents  additional  topics  for 
future  work.  Each  of  these  sections  incorporates  information  identified  in  previous 
chapters  and  assumes  a  small  working  knowledge  of  solar  physics. 
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II.  Background 


A  general  outline  of  solar  physics  is  presented  for  completeness.  The  Solar 
Cycle  is  discussed  to  illustrate  how  sunspot  groups  appear  over  large  amounts  of 
time.  Additionally,  differential  rotation  and  limb  darkening  are  developed  to 
illustrate  the  purpose  of  these  correction  steps  written  into  the  following  chapters. 
The  classification  system  for  sunspots  is  reiterated  from  the  McIntosh  paper  and  for 
reference.  Finally,  the  automatic  approach  through  the  use  of  SDO  imagery  is 
addressed  in  conjunction  with  background  on  the  imagery  itself. 

2.1  Solar  Cycle 

The  dynamic  evolution  of  solar  properties  lead  to  the  discovery  of  a  cyclical 
pattern  of  increasing  and  decreasing  solar  activity  [Foukal,  2008].  Made  evident  by 
the  increasingly  common  presence  of  sunspots  on  the  solar  disk,  this  oscillatory 
pattern  is  repeated  with  a  fairly  regular  period.  This  Solar  Cycle  is  marked  by 
increasing  solar  activity  over  the  corresponding  11  year  time  span.  Beginning  with 
the  solar  minimum  when  the  sun  exhibits  the  least  amount  of  magnetic  activity,  the 
solar  cycle  starts  by  ramping  up  the  intensity  and  occurrences  of  active  regions.  As 
the  magnetic  held  of  the  sun  becomes  more  twisted  with  time,  the  frequency  of 
sunspot  appearances  on  the  surface  of  the  sun  increases,  and  the  latitude  of  these 
sunspot  groups  begins  to  shift  towards  the  solar  equator.  Culminating  with  the 
solar  maximum  at  the  end  of  the  cycle,  the  magnetic  held  of  the  sun  experiences  a 
full  reversal  before  the  process  begins  again.  Therefore,  the  magnetic  held  of  the  sun 
reverses  polarity  every  11  years.  This  reversal  is  thought  to  come  about  as  a  result 
of  active  regions  shifting  towards  the  solar  equator  as  the  cycle  progresses,  slowly 
hipping  the  polarity  of  the  sun’s  magnetic  held  [ Foukal ,  2008]. 
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2.2  Differential  Rotation 


The  general  orbit-like  flow  of  plasma  at  varying  speeds  in  the  sun  is  known  as 
differential  rotation.  Radially  driven  convective  regions  seem  to  be  the  cause  of  the 
different  flow  rates  of  plasma  at  different  heights  in  the  sun.  This  radial  motion  of 
plasma  is  a  result  of  the  buoyancy  of  hot  plasma  at  the  bottom  of  the  convective 
region  moving  towards  the  photosphere.  The  rotational  motion  of  plasma  can  be 
measured,  seen  as  a  slight  blue  and  red  shift  near  the  edge  of  the  sun,  the  degree  of 
which  changes  with  latitude.  The  rotation  is  referred  to  as  differential  because  the 
speed  of  rotation  varies  with  solar  latitude  and  radial  depth  into  the  sun.  The 
differential  rotation  rate  varies  up  to  10  full  days  for  a  complete  rotation  depending 
on  the  specific  part  of  the  photosphere  being  observed.  While  this  effect  is 
considered  in  sunspot  tracking  methods,  it  has  no  effect  on  the  limb  darkening 
correction  applied  to  the  solar  disk  to  flatten  the  intensity  drop  off. 

2.3  Limb  Darkening 

The  photosphere  is  considered  to  be  the  surface  of  the  sun  with  a  depth  of 
approximately  500  km  [Foukal,  2008].  This  definition  is  awkward  however,  because 
there  is  no  hard  cut  off  where  the  density  of  gaseous  plasma  from  the  sun  becomes 
negligible,  as  would  be  the  case  in  a  solid  or  liquid.  Instead,  the  Beer-Lambert  law 
is  used  to  concretely  define  a  surface  using  the  optical  depth  definition  for  a  region 
of  the  sun  at  a  specific  wavelength.  In  looking  at  the  change  in  emission  intensity 
from  an  origin  point  to  a  location  at  angle  6  inside  an  absorbing  medium,  it  can  be 
shown  that 


Ix(0,  z  +  dz)  -  I\{6,  z )  =  [e\(z)  -  kx(z)Ix(0 ,  z)\secddz  (1) 
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where  K\  and  e\  are  the  absorption  and  emission  coefficients  respectively.  If  this 
change  is  approximated  as  a  derivative  with  respect  to  z,  the  instantaneous  change 
of  intensity  yields  a  relationship  for  the  distance  a  wave  can  penetrate.  If  the  optical 
depth  is  defined  as  dr\  =  — K\dz  and  emission  is  neglected,  the  ratio  of  final  to 
initial  intensity  takes  the  form 


dl[ 


A  =  e-rA 


(2) 


From  this  equation,  it  can  be  seen  that  as  r  increases,  the  intensity  drops 
exponentially.  This  means  that  as  depth  increases  and  r  also  increases,  the  amount 
of  light  that  penetrates  the  plasma  in  the  sun  decreases  rapidly.  For  r  —  1,  the 
intensity  of  the  source  has  dropped  to  37%  of  the  original  value.  The  surface  of  the 
sun  (Photosphere)  is  defined  to  be  the  location  where  optical  depth  r  is  equal  to  one 
at  the  hydrogen  a  wavelength  at  6173.3%.  Because  the  optical  depth  of  the  sun 
depends  on  a  specific  wavelength,  it  is  necessary  to  define  a  specific  part  of  the 
electromagnetic  spectrum.  The  optical  depth  is  also  a  function  of  temperature.  The 
large  variation  in  temperature  in  the  radial  direction  of  the  sun  therefore  causes  the 
change  in  optical  depth  to  vary  radially.  This  variation  ultimately  implies  that  as  an 
observer  looks  more  towards  the  edge  of  the  solar  disk,  the  radial  component  of 
their  observation  line  decreases  and  they  look  increasingly  tangent  to  the  surface  of 
the  sun.  An  observation  towards  the  edge  of  the  sun  therefore  means  that  one  does 
not  see  as  deep  into  the  sun,  but  that  the  location  where  r  =  1  is  higher  up  and 
therefore  cooler  as  seen  in  Figure  1.  The  location  where  r  =  1  is  not  at  an  equal 
radius  for  all  values  of  9,  meaning  that  the  temperature  of  the  plasma  at  the  r  =  1 
surface  is  not  uniform.  This  variation  in  temperature  along  the  r  surface  means 
that  the  maximum  emitted  wavelength  of  light  for  a  particular  region  depends  on 
the  angle  6  between  the  normal  vector  of  that  position  and  the  observer’s  location. 


The  left  sun  shows  concentric  rings  of  isothermal  regions  with  the  optical  depth 
surfaces  of  1  and  10  in  black  as  seen  from  a  distant  observer.  The  right  sun  shows 
concentric  rings  of  the  same  optical  depth  surfaces  with  the  corresponding 
temperature  profiles  varying  along  those  curves.  This  sun  represents  the  change  in 
maximum  emitted  wavelength  with  distance  from  the  center  of  the  solar  disk  from 
the  observer’s  perspective.  It  is  important  to  note  that  neither  sun  is  drawn  to 
scale;  the  optical  depth  surfaces  are  idealized. 

The  Plank  Distribution  function  that  determines  the  spread  of  electromagnetic 
waves  from  a  blackbody  for  a  specific  temperature  illustrates  that  the  peak 
wavelength  for  which  intensity  is  maximum  shifts  to  shorter  wavelength  for  higher 
temperature  is  given  by 


27 The2  1 

A  ^  expig?)-  1  U 

If  this  distribution  is  integrated  over  all  wavelengths  A,  the  result  would  be  1  when 
normalized  properly,  indicating  that  the  total  intensity  of  emitted  light  is 
distributed  over  all  possible  wavelengths.  From  Wein’s  displacement  law,  it  can  be 
shown  that  as  the  temperature  increases,  the  wavelength  at  which  the  blackbody 
emission  is  maximized  decreases  [ Foukal ,  2008].  Conceptually,  this  makes  sense 
because  as  wavelength  decreases,  frequency  increases  and  so  does  the  energy  of  each 
photon.  As  a  consequence,  when  the  sun  gets  cooler  further  out  on  the  limbs,  the 
maximum  emitted  wavelength  seen  at  an  optical  depth  of  1  in  those  regions 
becomes  longer,  making  the  limbs  appear  more  red  and  darker  than  the  center  of 
the  solar  disk  which  will  be  more  yellow. 

This  effect  can  also  be  visualized  as  the  intensity  drop  off  for  large  angles. 
When  an  observer  receives  the  emitted  radiation  at  an  angle  6  from  the  normal, 
there  will  be  a  decrease  in  the  amount  of  photons  he  or  she  can  receive.  If  the 
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Figure  1.  Optical  depth  dictates  the  intensity  profile  of  the  sun.  Near  the  center  of  the 
disk,  an  observer  can  see  farther  into  the  sun  where  the  temperature  of  the  plasma  is 
hotter.  I  is  the  intensity  of  light  coming  from  any  point  on  the  sun  and  is  a  function 
of  9  which  represents  the  angle  off  from  center  towards  an  observer,  r  is  the  optical 
depth  surface  where  r  =  1  is  the  surface  of  the  photosphere. 


radiator  emits  in  the  r  direction,  an  observer  only  accepts  the  absolute  value  of  the 
dot  product  between  the  propagation  vector  and  the  surface  normal.  Looking  at  the 
radiated  power  per  solid  angle, 


dP\  =  I\(r,9)dAcos9dud\ 


(4) 
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where  the  cos (0)  term  originates  from  this  dot  product,  the  radiated  power  falls  off 
as  a  function  of  cosine  near  the  edges  of  the  sun.  This  additional  way  of  viewing 
limb  darkening  is  also  summarized  in  Figure  1. 


2.4  Sunspots 

Sunspots  are  cooler  regions  on  the  photosphere  that  emit  a  lower  intensity  of 
light  in  all  directions,  therefore  appearing  darker  than  the  rest  of  the  sun.  Sunspots 
are  different  from  sunspot  groups,  the  latter  being  a  collection  of  the  former. 
Because  the  magnetic  held  of  the  sun  becomes  twisted  as  a  result  of  differential 
rotation,  magnetic  pressure  can  build  up  and  force  regions  of  higher  magnetic  held 
to  compile  within  the  sun.  As  the  balance  between  magnetic  and  thermal  pressure 
must  remain  constant,  these  regions  with  higher  magnetic  held  become  buoyant  and 
rise  to  the  surface,  carrying  the  magnetic  held  along  for  the  ride  [Foukal,  2008].  The 
Magnetic  Reynolds  Number  is  a  quantity  that  indicates  whether  or  not  flowing 
plasma  will  influence,  or  be  influenced  by,  a  magnetic  held  and  is  given  by 


„  v  x  (V  x  B) 

Rm  =  ~  ho vL 

—  V 2  B 

fi0a  v 


(5) 


where  a  is  the  conductivity  of  the  plasma  and  L  is  the  characteristic  length  of  the 
plasma  [Foukal,  2008].  Because  the  Magnetic  Reynolds  Number  shrinks  outside  of 
the  region  where  convection  dominates,  the  plasma  entangled  with  the  magnetic 
held  in  that  region  becomes  trapped  and  eventually  suspended  against  the  gravity 
of  the  sun.  Convection  in  these  regions  is  mostly  suppressed,  eliminating  the  biggest 
source  for  heat  on  the  photosphere  .  In  this  case,  the  magnetic  held  holds  plasma  in 
place,  preventing  sections  from  cycling  back  down  to  reheat.  As  the  heat  of  these 
suspended  regions  is  radiated  away,  the  local  opacity  within  the  magnetic  held 
begins  to  increase.  When  opacity  increases,  the  region  becomes  optically  thick  to 
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light  from  the  lower  levels  of  the  sun,  giving  the  active  region  a  darker  appearance. 
Sunspots  still  emit  radiation,  but  their  lower  temperature  causes  them  to  appear 
dark  compared  to  the  surrounding  unobstructed  photosphere.  Sunspots  can  be 
different  shapes  and  sizes  but  are  likely  proportional  to  the  configuration  of  the 
magnetic  field  in  their  immediate  region. 

2.4.1  Evolution  of  Sunspots. 

Although  suspended  by  the  magnetic  field  of  the  sun  in  the  photosphere, 
sunspots  are  not  static  objects.  As  the  magnetic  field  of  the  sun  changes,  sunspot 
groups  can  change  shape,  size  and  orientation  on  a  short  time  scale.  These  groups 
can  also  develop  additional  spots,  a  characteristic  that  will  contribute  to  the 
classification  of  the  group,  discussed  in  Section  2.5.  Additionally,  sunspot  groups 
can  drift  on  the  solar  surface,  and  the  location  of  sunspot  groups  on  the  solar  disk 
tend  to  decrease  in  latitude  as  the  solar  cycle  progresses  towards  a  maximum, 
known  as  Sporer’s  Law  [Foukal,  2008].  It  is  therefore  more  likely  to  find  spots  near 
the  solar  equator  when  nearing  a  solar  maximum. 

2.4.2  Important  Sunspot  Features. 

Because  there  are  many  different  aspects  that  make  up  a  sunspot  group,  it  is 
necessary  to  identify  key  features  that  could  be  identified  across  different  types  of 
groups.  Determined  from  observations  and  experience,  important  features  include 
the  number  and  proximity  of  spots,  area  of  each  sunspot  in  the  group,  as  well  as  the 
presence  or  absence  of  a  penumbra  around  the  central  umbra  [ McIntosh ,  1990].  The 
proximity  of  spots  refers  to  the  angular  separation  between  different  spots  that 
could  make  up  a  group.  The  area  of  each  sunspot  means  the  total  area  of  the 
sunspot  appearing  on  the  surface  of  the  sun  at  the  bottom  of  the  photosphere.  The 
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umbra  is  defined  to  be  the  darker  center  region  in  the  sunspot  while  the  penumbra 
is  the  region  surrounding  the  umbra  that  is  still  optically  thick  but  is  less  dark  than 
the  umbra  [Foukal,  2008].  Depending  on  the  combination  of  these  elements,  a 
sunspot  or  sunspot  group  can  be  given  one  of  60  different  classes  through  the 
McIntosh  system  [ McIntosh ,  1990]  that  are  each  associated  with  a  flare  probability 
based  on  previous  spot-flare  relations  with  that  same  classification. 

2.5  McIntosh  Classification 

The  features  of  a  sunspot  and  other  local  sunspots  considered  part  of  a  group 
are  assigned  a  classification,  defined  by  the  solar  astrophysics  community  using  a 
three  tier  classification  [ McIntosh ,  1990].  Based  on  a  previous  classification  system 
developed  by  [ Kiepenheuer ,  1953],  this  method  introduces  two  additional  categories 
by  which  analysts  can  differentiate  sunspot  groups.  This  system  takes  into  account 
size,  area,  orientation,  fullness,  and  density  of  spots  within  a  group  and  assigns  a 
three  letter  code  reported  in  the  form  ‘Zpc’.  In  this  notation,  ‘Z’  represents  the  first 
letter,  ‘p’  represents  the  second,  and  ‘c’  the  third  letter.  Each  letter  series,  shown  in 
Figure  2,  will  be  explained  in  the  subsequent  sections  (Sections  2.5.1  -  2.5.3). 

The  term  Zurich  classification  is  sometimes  used  to  describe  the  first  letter  in 
the  code  report  (‘Z’),  but  this  refers  to  the  modified  Zurich  classification  used  in  the 
McIntosh  scheme.  Each  part  of  the  McIntosh  code  may  also  be  referred  to  as 
“letters” ,  sometimes  interchangeably  with  the  description  of  what  each  letter 
represents.  For  example,  ‘p’  may  be  called  the  second  letter,  meaning  that  it  is 
reported  as  the  second  part  of  the  McIntosh  code,  even  though  it  may  be  written  as 
a  single  letter  in  this  context.  It  may  also  be  simple  to  think  about  the  second  letter 
as  though  it  were  written  ‘-p-’  where  the  1st  and  3rd  classifications  are  omitted. 
Although  the  letters  may  be  referred  to  as  single  items,  they  represent  one  part  of 
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Figure  2.  The  different  letters  of  th  McIntosh  classification  scheme  are  shown;  The  full 
classification  requires  one  letter  from  each  series.  Series  1  shows  the  modified  Zurich 
classification  letters.  Series  2  shows  the  penumbra  classification  letters.  Series  3  shows 
the  compactness  classification  letters. 


the  three  letter  code.  Every  sunspot  group  must  have  a  code  for  each  letter,  even 
though  these  letters  could  be  addressed  one  at  a  time.  The  Zurich  classification  is 
always  reported  as  a  capital  letter  while  the  2nd  and  3rd  letters  are  reported  as 
lower  case.  The  second  two  letters  do  not  overlap  by  using  any  of  the  same 
designators  (with  the  exception  of  ‘x’  which  has  the  same  meaning  in  both  cases). 

2.5.1  Modified  Zurich  Classification:  ‘Z’. 

The  first  section  of  the  McIntosh  classification  relates  to  the  modified  Zurich 
classification  of  a  sunspot  and  is  purely  determined  by  the  size  or  length  of  a 
sunspot  group  and  presence  of  a  penumbra  within  the  spot  group.  This  part  of  the 
classification  is  shown  in  series  1  in  Figure  2.  While  there  are  plenty  of  elements  to 
look  at  for  this  classification,  certain  subgroups  will  always  yield  a  given 
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Table  1.  Each  letter  of  the  modified  Zurich  classification  is  defined  by  a  specific  set  of 
parameters  including  the  presence  or  absence  of  penumbra  within  the  sunspot  group 
and  the  length  of  the  sunspot  group. 


A 

Unipolar  sunspot  group  with  no  penumbra,  either  the  formative  or  final  stage 
of  a  group 

B 

Bipolar  group  with  no  penumbra 

C 

Bipolar  group  with  penumbra  on  one  end  of  the  group 

D 

Bipolar  group  with  penumbra  on  spots  at  both  ends  of  the  group  with  length 
<  10° 

E 

Bipolar  group  with  penumbra  on  spots  at  both  ends  of  the  group  with  length 
10°  <  15° 

F 

Bipolar  group  with  penumbra  on  spots  at  both  ends  of  the  group  with  length 
>  15° 

H 

Unipolar  group  with  penumbra 

classification  and  some  classifications  are  rare  and  do  not  occur  on  a  regular  basis. 
For  example,  if  the  spot  is  unipolar  with  no  penumbra,  the  only  classification 
capable  of  being  assigned  is  ‘A’.  The  criteria  for  determining  the  modified  Zurich 
classification  is  shown  in  Table  1. 


2.5.2  The  Penumbra:  ‘p’. 

The  second  classification  depends  on  the  presence  or  absence  of  a  penumbra 
around  the  largest  spot  in  the  sunspot  group.  This  is  denoted  by  the  number  2  in 
Figure  2.  This  step  depends  on  the  definition  of  a  satisfactory  threshold  for 
determining  both  the  penumbra  and  the  umbra.  Clearly,  as  not  all  leading  spots 
have  penumbra,  there  is  a  limit  where  certain  groups  will  not  even  qualify  for  the 
second  tier  of  classification.  Therefore,  logical  determinations  for  this  step  are 
dependent  on  certain  situations  and  will  not  always  be  necessary.  The  criteria  for 
determining  the  Penumbra  classification  is  shown  in  Table  2. 
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Table  2.  Each  letter  of  the  penumbra  classification  is  defined  by  parameters  including 
the  size  and  shape  of  penumbra  on  the  largest  sunspot  within  the  sunspot  group. 


X 

Undefined,  no  penumbra 

r 

Rudimentary,  incomplete  penumbra  partially  surrounding  the  largest  spot  and 
granular  in  nature. 

s 

Small,  symmetric  penumbra.  Mature,  dark  penumbra  surrounding  the  leading 
spot.  The  North-South  diameter  of  the  penumbra  is  <  2.5° 

a 

Small,  asymmetric  penumbra.  Penumbra  of  the  largest  spot  is  irregular  and 
multiple  umbra  within  the  penumbra  exist.  North-South  diameter  of  the 
penumbra  is  <  2.5° 

h 

Large,  symmetric  penumbra.  Same  as  class  ‘s’  but  the  North-South  diameter 
of  the  penumbra  is  >  2.5° 

k 

Large,  asymmetric  penumbra.  Same  as  class  ‘a’  but  the  North-South  diameter 
of  the  penumbra  is  >  2.5° 

Table  3.  Every  letter  of  the  compactness  classification  is  defined  by  parameters  includ¬ 
ing  the  number  of  sunspots  in  the  group  and  the  number  of  sunspots  surrounded  by 
penumbra  in  the  group. 


X 

LIndefined,  group  is  unipolar. 

o 

Open  group.  Few,  if  any  spots  between  leading  and  following  spot.  Interior 
spots  are  generally  very  small. 

i 

Intermediate  group.  Numerous  spots  between  leading  and  following  spots.  No 
interior  spots  posses  penumbra. 

c 

Compact  group.  Area  between  leading  and  trailing  spots  is  heavily  populated 
and  at  least  one  interior  spot  posses  a  mature  penumbra. 

2.5.3  Sunspot  Distribution:  ‘c’. 

The  final  classification  relates  to  the  presence  or  absence  of  additional  spots 
between  the  leading  and  trailing  spot  in  the  sunspot  group.  The  more  spotted  the 
group  is,  the  more  irregular  the  magnetic  field  in  that  region.  Therefore,  the  three 
different  types  under  this  classification  pay  attention  to  the  increase  in  magnetic 
complexity  in  the  region  of  the  sunspot  group,  in  addition  to  the  spread  of  the 
convection  suppression.  Examples  of  this  third  category  are  shown  in  Figure  2.  The 
criteria  for  determining  the  sunspot  distribution  classification  is  shown  in  Table  3. 
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2.6  Solar  Dynamics  Observatory 


The  SDO  is  a  NASA  sponsored  satellite  that  was  launched  in  2010  to  observe 
the  sun  at  various  wavelengths  of  the  electromagnetic  spectrum.  The  SDO  orbits 
the  earth  in  a  circular  geosynchronous  orbit  at  an  inclination  of  28°  [Pesnell  et  al, 
2012],  An  orbit  of  this  type  yields  nearly  continuous  observation  of  the  sun  with  the 
exception  of  a  1-2  month  period  during  the  year  when  the  sun  is  blocked  by  the 
earth  for  a  short  period  of  time  every  day.  The  SDO  takes  a  high  resolution  photo 
of  the  sun  every  minute  at  many  different  wavelengths  in  order  to  get  a  full  picture 
of  solar  effects  and  their  impact  on  the  earth.  This  is  done  through  a  combination  of 
instruments:  the  Atmospheric  Imaging  Assembly  (AIA),  Extreme  Ultraviolet 
Variability  Experiment  (EVE),  and  the  Hclioseismic  and  Magnetic  Imager  (HMI) 

[. Pesnell  et  al. ,  2012],  There  are  two  types  of  images  focused  on  in  this  research.  The 
Erst  image  used  is  a  Hclioseismic  and  Magnetic  Imager  Intensitygram  (HMII)  full 
disk  continuum  image  centered  at  6173.3  ±  O.lA  [Schou  et  al,  2012],  providing  a 
high  spatial  resolution  image  of  the  photosphere  that  is  synchronized  with  the 
second  image,  a  full  disk  Hclioseismic  and  Magnetic  Imager  Magnetogram  (HMIM). 
The  HMI  instrument  measures  the  Zeeman  effect  variation  of  the  Fe  I  spectral  line 
coming  from  different  locations  on  the  photosphere,  yielding  the  Stokes  parameters 
necessary  to  determine  the  magnetic  polarity  of  those  regions  [ Pesnell  et  al,  2012], 
The  various  pixel  values  in  the  HMIM  image  represent  the  intensity  of  the  magnetic 
held  in  each  region,  white  being  opposite  black  and  gray  having  no  particular 
polarity.  Pixel  values  in  the  HMII  image  represent  the  intensity  of  light,  and  images 
can  be  obtained  in  color  or  grayscale. 
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2.7  White  Light  to  Fe  I  Line  Comparison 

Drawings  performed  on  the  Solar  Optical  Observing  Network  (SOON) 
telescope  use  white  light  projections  spanning  a  wider  range  of  wavelengths 
compared  to  SOHO  MD1  and  SDO  HMI  images.  Precedence  has  been  set  for  using 
images  centered  on  both  the  H-alpha  line  at  6563H  and  the  Ca  If  K  line  at  3934A 
[. Zharkov  et  al. ,  2005]  for  automated  detection  and  classification.  In  addition,  groups 
have  also  used  SOHO  MDI  imagery  (upon  which  the  SDO  HMI  instrument  is  based) 
centered  on  the  Ni  I  6768H  line  [ Aschwanden ,  2010;  Schou  et  al.,  2012],  However, 
there  are  differences  between  filtered  images  and  white  light  images,  primarily 
having  to  do  with  the  height  at  which  the  optical  depth  reaches  unity,  similar  to  the 
limb  darkening  effect  discussed  in  Section  2.3.  Therefore,  depending  on  the 
wavelength  being  observed,  the  height  of  the  observation  surface  will  change  and  the 
features  on  that  surface  may  be  different  than  the  features  an  observer  would  see  in 
white  light.  Note  that  electing  to  use  filtered  imagery  will  change  the  height  at 
which  sunspots  are  observed.  Wien’s  displacement  law  [. Foukal ,  2008]  shows  that 

2.89776829  x  10 6[nmK] 

~  ah 

This  law  is  used  to  show  that  the  temperature  at  the  surface  of  the  H-alpha  filtered 
image  is  approximately  4400K  and  the  temperature  at  the  surface  of  the  Ca  II  K 
filtered  image  is  approximately  7400K.  Both  of  these  temperatures  are  reached  in 
the  chromosphere,  the  layer  above  the  photosphere.  The  SDO  Fe  I  line  at  6173A 
sits  between  both  the  Ha  line  and  the  Ca  II  K  line,  meaning  that  it’s  temperature 
will  be  between  those  line’s  temperatures  as  well.  The  maximum  extent  of  the 
chromosphere  above  the  photosphere  is  approximately  2000  km  [ Foukal ,  2008], 
indicating  that  the  change  in  height  of  the  observation  surface  between  any  of  the 
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absorption  lines  and  the  white  light  image  (assumed  to  be  at  the  base  of  the 
photosphere)  will  be  less  than  100(2000/cm/695500/cm)  =  0.288%  the  radius  of  the 
sun.  While  this  height  difference  may  have  an  effect  on  the  shape  and  extent  of  the 
sunspots,  there  is  precedence  for  neglecting  this  difference  due  to  the  fact  that  the 
features  do  not  change  much  between  the  photosphere  and  the  base  of  the 
chromosphere.  The  main  difference  is  that  filtering  the  sun  to  a  specific  wavelength 
reduces  the  contrast  between  sunspot  and  quiet  sun,  introducing  some  difficulty  in 
detecting  the  darker  regions. 

Moreover,  the  SDO  method  uses  both  magnetic  and  optical  information  to 
perform  a  classification,  an  impossibility  when  using  other  types  of  images.  No  other 
image  produces  the  required  data  for  this  process,  so  differences  associated  with 
comparing  the  Fe  I  line  images  to  white  light  drawings  are  accepted.  These  errors 
are  assumed  to  be  negligible  when  comparing  sunspot  quantities  and  classifications. 

2.8  Automation  Approach 

The  first  step  in  the  completion  of  this  research  was  to  build  a  robust  method 
for  sunspot  detection  based  on  a  set  of  objective  rules  outlined  by  McIntosh 
[ McIntosh ,  1990].  In  using  the  largest  size  image  available,  produced  every  minute 
by  the  SDO,  the  resolution  of  the  automated  program  is  much  higher  than  many 
previously  implemented  methods.  This  portion  of  low  level  image  processing  is 
completed  carefully  so  as  to  minimize  the  error  associated  with  quantized  intensity 
values  and  the  calculation  of  important  elements  that  depend  on  approximations  like 
the  limb  darkening.  Once  the  initial  task  of  detection  is  completed,  a  series  of  logical 
operators  is  arranged  to  establish  groupings  as  well  as  a  classification  of  each  group 
based  on  the  McIntosh  Classification  system  covered  in  Section  2.5.  While  neural 
networks,  similar  to  those  used  by  [Colak  and  Qahwaji,  2008],  can  be  advantageous 
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when  training  a  computer’s  decision  making  with  direct  observational  data,  it  can 
be  argued  that  this  process  becomes  compromised  when  applied  to  a  system 
containing  significant  internal  discrepancies.  Discussed  in  Chapter  IV,  the  inherent 
difference  between  a  standard  observatory  and  national  standard  published  sunspot 
reports  yield  agreement  percentages  as  low  as  22.1%.  While  these  observations  are 
made  at  different  times  throughout  the  day  and  spot  group  evolution  is  expected, 
discrepancies  between  reporting  groups  are  not  small,  often  displaying  differences  of 
up  to  4  levels  in  Zurich  Classification  (A  to  D  or  B  to  E,  also  addressed  in  Chapter 
IV).  It  is  because  of  this  difference  that  the  implementation  of  a  strictly  objective 
classification  and  grouping  process  that  is  unbiased  by  any  human  input,  past  or 
present,  is  used.  This  emphasizes  that  comparisons  made  to  observatories  or 
reporting  entities  do  not  necessarily  represent  the  proximity  of  the  algorithm  to 
ground  truth.  On  the  contrary,  these  comparisons  simply  represent  the  proximity  of 
the  algorithm  to  to  the  average  classification  published  for  that  day  based  on  the 
varying  interpretation  of  McIntosh  classification  principles. 

2.9  Hand  Drawing  Method 

The  Air  Force  has  established  methods  for  observing  and  classifying  sunspots 
at  three  separate  optical  observatories  around  the  world.  Methodology  for 
classifying  and  reporting  of  these  sunspots  is  outlined  in  the  Air  Force  Manual 
(AFMAN)  15-124  [  USAF ,  2013].  All  analysts  are  trained  through  the  same  course 
taught  at  Holloman  AFB,  generally  lasting  a  little  over  one  month  to  cover  not  only 
the  steps  for  performing  the  drawing  and  classification  of  each  spot  group  observed, 
but  also  some  of  the  underlying  physics  behind  the  process.  It  should  be  noted  that 
spot  drawings  are  only  part  of  the  work  required  of  a  solar  analyst,  so  this 
instructional  period  also  covers  the  additional  duties  that  one  would  encounter  while 
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on  shift  at  a  solar  observatory,  to  include  the  time  critical  reporting  of  solar  flares 
that  may  be  seen  on  the  solar  limb.  A  full  description  of  the  hand  drawing  method 
used  at  AF  Solar  Observatories  is  addressed  in  Appendix  1. 


2.10  Resolution  Comparison 

Resolution  differences  between  a  white  light  drawing  board  and  a  full  size 
SDO  image  should  be  addressed.  The  CCD  in  the  HMI  instrument  on  the  SDO 
satellite  bins  photons  collected  through  the  aperture  and  converts  those  signals  into 
a  digital  count  value  corresponding  to  the  intensity  of  the  region.  This  binning 
causes  a  quantification  error  in  that  storing  information  in  a  pixel  reduces  a 
continuous  span  of  intensity  or  magnetic  polarity  data  from  a  span  of  the  sun  into 
discrete  values  that  a  computer  can  interpret.  This  quantization  error  can 
sometimes  be  significant,  but  it  is  important  to  realize  that  in  the  case  of  the  full 
4096  X  4096  SDO  Intensitygram  image,  the  spatial  span  of  a  single  pixel  is  very 
small.  To  compare  the  resolution  of  the  SDO  image  to  the  corresponding  resolution 
on  the  AFWA  Form  21,  note  that  on  average,  the  solar  diameter  in  pixel  space 
throughout  the  year  on  a  full  resolution  SDO  image  is  approximately  3800  pixels. 
Dividing  the  solar  diameter  of  18  cm  on  the  Form  21,  the  ratio  of  pixels  to  meters  is 
obtained,  and  the  resolution  that  one  pixel  corresponds  to  in  real  space: 


(1  pixel) - : - —  =  9.474  x  10  5m  (7) 

3800 pixels 

Converting  this  value  into  a  recognizable  unit,  it  can  be  seen  that  in  order  to 
draw  to  the  same  spatial  resolution  as  an  SDO  image,  an  analyst  would  need  pencil 
accuracy  of  about  95  microns,  a  difficult  feat  to  achieve.  It  can  therefore  be  said 
with  high  confidence  that  the  spatial  resolution  of  SDO  HMI  images  is  better  than 
the  spatial  resolution  of  the  SOON  telescope. 
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III.  Methodology 


This  chapter  details  the  development  of  the  sunspot  detection  and 
classification  algorithm  using  SDO  imagery.  To  do  this,  a  series  of  5  distinct  stages 
are  accomplished  in  order  to  produce  the  final  result:  the  sunspot  classification.  The 
first  stage  involves  the  acquisition  of  SDO  imagery  at  a  basic  level  of  processing.  In 
the  second  stage,  elementary  image  processing  techniques  are  used  to  condition  the 
data.  The  third  stage  involves  the  detection  of  sunspots  on  the  conditioned  solar 
images.  In  the  fourth  stage,  polarity  and  sunspot  location  information  is 
incorporated  to  obtain  sunspot  groupings.  Finally,  the  classification  stage  produces 
the  end  result  in  the  form  of  a  McIntosh  classification  for  each  detected  sunspot 
group.  This  process  is  summarized  in  Figure  3. 

A  simple  analysis  of  each  image  processing  tool  used  in  the  SDO  program 
justifies  every  step,  but  these  tools  are  well  established  and  therefore  not  addressed 
in  high  detail.  Following  the  development  of  the  automated  program,  methodology 
for  analysis  of  the  output  is  discussed  by  first  establishing  the  accuracy  of  the 
method  compared  to  the  output  from  two  trusted  sunspot  reporting  sources.  This 
accuracy  analysis  will  be  performed  in  Chapter  IV  with  respect  to  elements  such  as 
sunspot  area,  group  length,  and  McIntosh  classification. 

3.1  Image  Acquisition 

Images  were  downloaded  in  the  highest  available  resolution  from  the  NASA 
SDO  archives  for  processing  in  MATLAB.  In  order  to  procure  a  current  image  of 
the  sun,  or  any  archived  images  at  15  minute  time  steps  [ Pesnell  et  al,  2012],  the 
SDO  main  website  was  used  ( http://sdo.gsfc.nasa.gov/ ).  During  the  testing 
process,  a  years  worth  of  data  spanning  from  1  July  2012  to  30  June  2013  was 
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Figure  3.  The  five  basic  phases  of  the  SDO  code  are  summarized  in  a  flow  chart.  The 
corresponding  Sections  of  this  document  are  listed  adjacent  to  each  step. 


downloaded  from  the  SDO  website  with  the  aid  of  NASA  Goddard’s  input  of  image 
queries.  The  HMIM  image  is  downloaded  in  gray  scale  to  limit  the  size  of  the  image 
as  color  is  not  necessary  for  classification. 

3.2  Solar  Ephemeris 

In  order  to  obtain  the  rotation  parameters  of  the  sun  and  the  approximate 
position  of  the  SDO  with  respect  to  the  sun  at  the  time  any  image  is  taken,  the 
solar  ephemeris  is  needed  [Seidelmann  and  Urban ,  2010].  Conveniently,  there  are 
several  other  useful  parameters  that  can  be  extracted  from  the  solar  ephemeris  that 
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become  necessary  in  subsequent  sections  of  the  program,  so  the  computation  of  the 
position  of  the  SDO  satellite  is  essential  to  the  automation  process.  Numerical 
methods  for  calculating  the  solar  ephemeris  were  duplicated  from  the  Improved 
Solar  Observing  Optical  Network  (ISOON)  code  [ Meeus ,  1982;  Wilson,  1980]. 

First,  the  time  of  observation  is  taken  from  the  image  header  and  processed 
into  a  Julian  Date,  a  simple  single  digit  representation  of  time.  The  elapsed  time  is 
determined  from  January  1st,  1900  and  reported  in  terms  of  Julian  Days.  This 
number  is  used  to  calculate  the  longitude  of  perigee,  mean  anomaly,  equation  of 
center,  eccentricity,  and  the  right  ascension  of  the  ascending  node,  all  described  in 
[Meeus,  1982],  These  parameters,  displayed  in  Figure  4,  are  not  used  directly  in  the 
code  but  are  important  to  mention  for  completeness. 


True  Anomaly 
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Figure  4.  A  simplification  of  the  SDO  orbit  around  the  Sun  is  illustrated  by  the  blue 
ellipse  while  the  celestial  plane  is  shown  in  red.  Orbital  parameters  are  calculated  with 
reference  to  the  location  of  the  first  point  of  Aries. 


A  main  item  that  is  useful  in  the  calculation  of  the  solar  ephemeris  is  the 
approximate  radius  of  the  sun  as  seen  from  the  SDO.  Because  the  position  of  the 
earth  with  respect  to  the  sun  is  also  determined  from  the  ephemeris,  an 
approximation  for  the  sun  earth  distance  can  be  used  to  get  an  approximate  solar 
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radius  in  pixels.  To  put  the  radius  into  the  appropriate  units,  as  it  is  calculated  in 
meters,  it  must  be  multiplied  by  a  constant  of  proportionality  determined  through  a 
calibration  step.  To  do  this,  the  radius  of  images  spanning  an  entire  year  was 
combined  with  the  various  orbital  positions  determined  through  the  date.  By 
averaging  that  ratio  for  radius  and  position  values  for  each  image  sampled,  the 
average  constant  of  proportionality  was  experimentally  determined  to  be 
2.8476  x  1014.  Figure  5  shows  the  calculated  radius  values  for  a  specified  period  of 
time.  Sigma  is  the  size  of  the  smoothing  filter  used  to  calculate  the  radius  of  the  sun 
in  each  individual  image  via  the  Laplacian  of  a  Gaussian  comparison,  explained  in 
Section  3.3.1.  The  approximate  radius  of  the  sun  in  pixels  was  then  extracted  from 
the  ephcmeris  for  the  edge  detection  algorithm  to  ease  the  computational  intensity 
of  center  and  radius  determination,  discussed  in  Section  3.3.2. 

An  additional  item  extracted  from  the  ephemeris  was  the  B  angle,  defined  to 
be  the  angle  tilt  of  the  solar  north  pole  towards  or  away  from  the  observer  and 
orthogonal  to  the  solar  ecliptic  [Seidelmann  and  Urban,  2010].  This  angle  is 
continuously  changing  as  it  is  a  function  of  time  as  the  Earth  orbits  around  the  sun. 
When  determining  the  coordinate  system  to  use  on  the  sun,  the  B  angle  was 
important  to  establish  the  location  of  both  zero  longitude  and  latitude  [Thompson, 
2006].  If  the  B  angle  is  zero,  then  both  longitude  and  latitude  are  equal  to  zero  at 
the  very  center  of  the  sun.  However,  non- zero  B  angles  cause  the  latitude  to  shift 
either  up  or  down  on  the  disk.  The  Carrington  Longitude  is  neglected  when 
reporting  the  position  of  each  spot  group  [  Thompson ,  2006],  but  the  longitude  is 
still  affected  by  the  tilt  in  the  sun.  The  approximate  solar  diameter  in  radians  is 
also  extracted  from  the  ephemeris  because  it  helps  speed  the  calculation  of  the 
radius  in  later  steps. 
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Figure  5.  The  radius  of  the  sun  in  pixels  is  shown  to  linearly  change  over  the  month 
of  March,  2013. 


The  P  angle  is  defined  as  the  angle  of  rotation  between  the  solar  North  Pole 
and  the  celestial  North  Pole  as  seen  from  the  observer’s  position.  In  SDO  imagery, 
level  1  image  processing  corrects  for  the  P  angle  beforehand  ,  described  by  [ Lemen , 
2012],  so  the  SDO  algorithm  does  not  incorporate  a  correction  for  this  angle.  If  a  P 
angle  correction  is  needed,  perhaps  in  follow  on  research  to  apply  the  code  to  a 
wider  variety  of  images,  it  can  be  accounted  for  when  determining  the  longitude  and 
latitude  of  a  point  on  the  disk.  The  value  of  the  P  angle  for  any  given  time  is 
determined  from  the  ephemeris  and  was  calculated  in  the  program.  The  method  for 
this  calculation  is  specified  in  [ Meeus ,  1982],  but  is  not  essential  for  this  research. 
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3.3  Image  Preparation 


After  an  image  has  been  selected,  it  is  imported  with  its  counterpart  magnetic 
map  in  the  form  of  the  HMIM  image.  Both  HMII  and  HMIM  images  match  each 
other  in  terms  of  time,  a  luxury  not  available  to  previous  renditions  of  this  endeavor 
using  SOHO  images  [ Aschwanden ,  2010].  Additionally,  they  will  also  have  the  same 
spatial  resolution,  useful  in  future  steps  for  overlaying  the  two  images  as  points  will 
not  need  to  be  mapped  to  a  different  coordinate  system  for  processing.  Each  color 
band  of  the  HMII  image  is  not  necessary  for  any  steps  in  the  code  development,  but 
is  useful  in  visualization  of  calculated  features.  The  Red,  Green,  and  Blue  (RGB) 
image  is  converted  to  gray  scale  for  processing,  shown  as  number  1  to  number  2  in 
figure  6  by  averaging  the  elements  in  adjacent  color  bands.  This  choice  was  made 
because  each  color  band  does  not  yield  the  same  size  sun,  so  it  is  difficult  to  use 
them  in  conjunction.  In  the  future,  starting  with  grayscale  images  may  be  more 
practical. 

With  some  previously  implemented  methods  [Watson  and  Fletcher,  2010; 
Watson ,  2012],  sunspot  tracking  was  the  primary  focus.  Other  previously  explored 
automatic  recognition  and  classification  methods  require  a  computation  of  longitude 
and  latitude  using  a  single  image  for  sunspot  recognition  and  tracking  purposes 
outlined  in  the  [ Aschwanden ,  2010]  review  of  feature  recognition  techniques.  In  this 
method,  it  is  necessary  to  know  the  pixel  location  of  the  center  of  the  disk  as  well  as 
the  radius  of  the  sun  in  pixel  space  if  spot  positions  are  to  be  mapped  to 
heliographic  coordinates  (or  any  other  coordinate  system).  Because  the  SDO 
algorithm  must  function  on  a  single  image  and  is  not  focused  on  any  type  of 
tracking,  the  route  of  edge  detection  first  was  pursued. 

A  combination  of  tracking  with  position  detection  is  possible  assuming  image 
reconstruction  is  used  after  edge  detection  for  recognition  of  sunspots.  This  method 
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Figure  6.  The  color  Intensitygram  image  is  read  into  MATLAB  and  displayed  in  section 
one.  Section  two  displays  the  grayscale  image  created  by  averaging  the  red,  green, 
and  blue  planes  of  the  color  image  and  normalizing  to  the  uint8  scale.  Section  three 
represents  a  filled  black  and  white  image  post  canny  edge  detection,  demonstrating  the 
extent  of  the  sun  to  be  used  in  the  calculation  of  the  radius  and  center  of  the  sun,  as 
displayed  in  the  fourth  section. 


of  morphological  reconstruction  has  been  previously  preferred  for  its  speed,  but 
morphological  methods  like  reconstruction  are  non-linear  processes  that  alter  the 
data  in  an  unrecoverable  way  [Gonzalez  and  Eddins,  2009].  In  order  to  avoid  this 
loss  of  data,  the  SDO  method  minimizes  morphological  processing  and  instead  uses 

28 


an  iterative  thresholding  technique  to  grow  spots  in  a  similar  manner  without  using 
reconstruction.  This  idea  is  explored  in  Section  3.4. 

3.3.1  Edge  Detection  Process. 

The  next  step  in  algorithm  development  is  edge  finding.  The  ultimate  goal  of 
defining  the  edges  on  the  sun  is  to  find  both  the  center  of  the  sun  in  pixel  space  as 
well  as  the  radius  in  pixels  of  the  sun.  Representing  the  sun  as  a  circle  is  the  first 
step  to  accomplishing  this  goal.  The  edge  detector  selected  to  perform  this  task  uses 
the  Canny  technique  for  edge  definition  [Canny,  1986]  shown  in  Figure  6  as  Image 
3.  Because  of  the  large  size  of  the  images  used  (approximately  4-5  MBs),  the  kernal 
size  convolved  with  the  image  during  the  Gaussian  smoothing  step  to  define  edges  is 
minimized  in  order  to  limit  the  number  of  calculation  steps.  In  addition  to  limiting 
the  calculation  steps,  reducing  the  size  of  the  convolution  kernal  (the  same  sigma 
described  by  Figure  5)  decreases  the  position  error  of  the  edge  because  a  larger 
smoothing  filter  tends  to  introduce  ambiguity  in  the  line  being  detected  [Gonzalez 
and  Eddins,  2009].  When  determining  the  size  of  the  radius  as  well  as  the  center  of 
the  sun,  the  edge  must  be  roughly  circular  in  order  to  determine  a  best  fit  circle. 
This  means  that  there  is  a  trade  off  between  position  error  and  minimizing  kernal 
size  for  speed  and  maximizing  kernal  size  for  ease  in  the  determination  of  the  radius. 
A  default  value  of  3  is  used  for  the  width  and  height  of  the  Laplacian  of  a  Gaussian 
filter,  defined  to  be  the  weighted  kernal  type  applied  to  an  image  during  a 
convolution  step.  The  default  value  is  increased  in  steps  of  two  if  a  radius  and 
center  is  not  found  for  a  particular  iteration  of  circle  detection  in  order  to  keep 
symmetry  in  the  kernal. 

Edges  are  calculated  through  the  evaluation  of  a  the  local  gradient  inside  the 
convolution  kernal.  When  pixel  values  associated  with  each  element  inside  the 
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kernal  change  rapidly,  the  gradient  for  that  kernal  is  large.  Above  a  defined 
threshold,  pixels  that  satisfy  the  gradient  become  white,  while  other  pixels  are  set  to 
be  black  resulting  in  a  binary  image.  This  image  is  then  filled  and  processed  by  the 
center  and  radius  calculations  described  in  the  next  section.  Iteration  for  edge 
detection  is  generally  unnecessary  as  both  the  radius  and  center  values  are  usually 
found  on  the  first  try  because  of  the  acurate  approximation  of  the  size  of  the  sun  in 
pixels  from  the  ephcmeris.  If  additional  iterations  are  needed,  the  processing  time 
needed  to  complete  the  image  increases,  and  the  accuracy  of  the  radius  decreases  by 
one  to  two  pixels,  shrinking  the  value  of  the  calculated  radius.  Upon  testing  this 
method  for  over  a  months’  worth  of  images  (approximately  2500  images  from  March 
of  2013  seen  in  Figure  5),  images  that  required  a  sigma  value  of  5  or  more  yielded  a 
radius  value  of  about  2  pixels  less  than  the  average.  This  is  corrected  by  adding  a 
factor  of  to  the  radius  for  values  that  needed  a  sigma  value  over  3.  This  need 
for  this  correction  factor  can  be  seen  in  Figure  5  in  that  higher  sigma  values  give 
lower  radii  values. 

3.3.2  Center  and  Radius  Determination. 

The  second  part  of  this  iterative  loop  is  the  determination  of  the  center  and 
radius  of  the  sun  in  each  image.  This  is  done  using  a  circular  hough  transform, 
encompassed  in  MATLAB’s  imhndcircles  function  [ Gonzalez  and  Eddins,  2009]. 

The  circular  hough  transform  takes  a  binary  image  and  maps  all  points  with  a  value 
of  one  in  real  space  into  parameter  space.  Points  that  lie  on  perfect  circles  in  real 
space  will  be  mapped  to  a  single  point  in  parameter  space  with  coordinates 
representing  the  descriptive  parameters  defining  that  circle  in  real  space  [ Gonzalez 
and  Eddins ,  2009].  However,  the  discrete  nature  of  binning  in  the  CCD  camera  as 
well  as  the  slight  oblations  of  the  sun  causes  an  imperfect  circle  representation  in  an 


30 


image.  In  practice  therefore,  points  are  mapped  to  parameter  space  where  axis 
define  the  radius  and  center  of  all  possible  circles.  Points  are  given  a  weight  based 
on  their  proximity  to  every  other  point  in  parameter  space.  Regions/local  areas 
with  enough  points  to  satisfy  a  threshold  value,  selected  by  the  user  through 
specification  of  criteria,  are  fitted  to  yield  a  single  point  that  best  represents  all  the 
points  in  that  group.  The  selected  point  is  mapped  back  to  real  space,  giving  both 
the  approximate  radius  and  center  that  best  describes  the  combination  of  points 
constructing  the  circle.  This  method  yields  sub-pixel  accuracy,  even  on  full  4096  x 
4096  images  [ Gonzalez  and  Eddins,  2009].  In  using  the  circular  hough  transform  in 
MATLAB,  a  user  can  specify  certain  values  that  pertain  to  both  the  threshold  level 
desired  as  well  as  the  radius  range  in  which  points  should  be  tested  [Gonzalez  and 
Eddins,  2009].  The  threshold  level  is  set  through  the  alteration  of  a  sensitivity 
parameter  that  was  set  to  0.99  in  order  to  limit  the  number  of  circles  found.  For 
reference,  a  value  of  1  corresponds  to  a  perfect  circle,  likely  an  impossibility  in  a 
digital  image.  The  radius  range,  on  the  other  hand,  can  be  hard  coded,  but  it  is 
simpler  to  pull  out  an  approximate  value  and  decide  the  range  based  on  this  initial 
guess  of  a  radius.  Because  the  SDO  is  in  orbit  around  the  Earth  and  therefore  does 
not  follow  a  perfectly  circular  orbit  around  the  sun,  the  radius  of  the  sun  as  viewed 
from  the  SDO  position  changes  throughout  the  year.  This  change  can  be  predicted 
using  the  ephemeris  of  the  sun,  as  shown  in  Section  3.2.  The  eccentricity  of  the 
Earth’s  orbit  allows  the  calculation  of  radial  position: 


1  +  ecos(z/) 

where  v  is  the  true  anomaly  (shown  in  Figure  4)  of  the  SDO  with  respect  to  the  sun 
and  e  is  the  eccentricity  of  the  orbit  [ Meeus ,  1982],  Although  v  is  numerically 
approximated  and  the  value  of  r  will  not  be  exact,  this  value  can  still  be  used  to  get 
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a  good  guess  for  the  radius  of  the  sun.  This  radius  will  be  in  SI  units,  so  the 
constant  of  proportionality  mentioned  in  Section  3.2  will  be  needed  to  obtain  the 
approximate  radius  in  pixel  space.  From  here,  our  program  uses  this  number  to  put 
an  upper  and  lower  bound  on  the  radius  range  through  which  MATLAB  will  search 
for  circles.  A  range  of  1%  variation  above  and  below  the  radius  guess  is  sufficient  to 
find  the  center  and  radius  of  the  Sun  in  all  images.  This  approach  also  significantly 
reduces  the  computation  time  required  to  find  the  radius  as  the  range  required  to 
test  is  minimal. 

If  a  faster  result  is  required  in  order  to  get  a  quick  classification,  there  is  an 
option  to  compress  the  image  before  determining  the  center  and  radius.  This 
compression  combines  pixels  in  a  non-linear  fashion  and  therefore  introduces  small 
errors  in  center  position  and  radius  depending  on  the  amount  of  compression  used. 
Additional  compression,  decreases  the  processing  time  required  to  determine  the 
edge  of  the  sun,  but  there  is  a  trade  off  with  accuracy.  It  should  be  noted  that  the 
compressed  image  is  not  used  in  subsequent  calculations,  this  compression  is  solely 
used  to  calculate  the  radius  and  center  of  the  sun.  In  practice,  no  compression  is 
applied  to  processed  images  to  ensure  the  most  accuracy  possible.  In  addition,  this 
step  is  not  necessarily  the  most  computationally  intensive  step  and  in  reality,  does 
not  add  too  much  time  to  the  total  computation  time  for  each  image. 

3.3.3  Limb  Darkening  Correction. 

The  next  step  in  this  process  is  to  correct  for  the  Limb  Darkening  effect 
outlined  in  Section  2.3  .  The  simplest  way  to  complete  this  correction  is  to  create  an 
inverse  image  of  solely  quiet  sun  with  the  same  radius.  In  this  case,  both  the  inverse 
quiet  sun  image  could  be  added  to  the  image  being  processed,  simply  canceling  out 
any  curvature  in  the  intensity  and  leaving  a  flattened  sun. 
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In  order  to  create  the  inverse  solar  image  to  add  to  the  SDO  image  being 
processed,  an  initial  matrix  is  created  and  given  the  value  of  one  at  every  position. 
Next,  a  linear  range  spanning  the  size  of  the  solar  image  is  mapped  to  the  ones 
matrix  to  create  a  type  of  coordinate  system  in  matrix  form.  The  limb  darkening 
approximation  known  as  the  Eddington  approximation  [. Foukal ,  2008]  is  then 
mapped  to  this  coordinate  matrix  using  the  matrix  as  a  position,  resulting  in  an 
image  containing  intensity  values  corresponding  to  the  Eddington  approximation 
curve.  This  calculation  is  performed  using  the  relation 

j  1  i  +  Vip 
l  +  y/3 

where  I  represents  the  intensity  lost  on  the  limb  and  p  is  the  angle  between  the 
center  of  the  Sun  and  the  location  being  corrected  [Foukal,  2008].  Both  the  center 
pixel  obtained  in  the  edge  finding  routine  and  the  radius  of  the  sun  in  the 
corresponding  calculation  are  included  in  this  approximation.  The  background  of 
the  image,  meaning  the  part  of  the  image  that  lies  outside  the  radial  distance  from 
the  center  of  the  sun,  was  given  an  intensity  value  equal  to  the  quiet  sun  to  simplify 
the  sunspot  detection  phase.  This  image  can  subsequently  be  added  to  the  SDO 
image  from  which  it  was  constructed  in  order  to  obtain  a  flattened  image  of  the  sun 
to  a  first  order  approximation,  as  shown  in  Figure  7. 

3.4  Thresholding 

Previous  methodology  for  determining  the  location  of  sunspots  in  a  given 
image  generally  involved  region  growing  techniques  through  the  use  of  morphological 
reconstruction  outlined  by  [ Aschwanden ,  2010].  This  process  uses  a  matrix  of 
starting  guesses  for  the  location  of  each  sunspot  in  the  image  known  as  the  marker 
image  as  well  as  the  original  image  for  a  mask.  The  purpose  of  the  mask  is  to  allow 
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Figure  7.  The  original  image  in  grayscale  is  plotted  both  as  an  image  and  using  a  three 
dimensional  surface  representation.  In  the  center,  the  correction  image  determined  by 
the  Eddington  Approximation  is  shown  as  an  image  and  in  three  dimensional  form. 
The  two  images  are  added  together  to  reproduce  the  corrected  ’’flat”  sun  shown  at  the 
bottom  of  the  figure.  This  final  image  can  inverted  and  thresholded  in  the  next  step 
to  determine  the  location  and  size  of  each  sunspot  on  the  image. 


regions  to  grow  outward  through  morphological  openings  to  the  maximum  extent  of 
the  spot  seen  on  the  original  image  [Gonzalez  and  Eddins ,  2009].  The  marker  image 
is  grown  through  gradual  opening  of  single  pixel  regions  until  the  edge  of  the  object 
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being  grown  is  reached  in  the  mask  image.  From  this  point,  reconstruction  continues 
until  the  resulting  image  doesn’t  change  with  subsequent  iterations.  This  final 
image  should  be  a  binary  image  with  a  full  representation  of  each  sunspot  grown  on 
the  solar  disk  in  a  black  and  white  format.  Reconstruction  of  this  type  depends 
heavily  on  the  starting  positions  in  the  marker  image  being  accurate.  Only  regions 
that  are  marked  will  be  grown  to  the  correct  extent  in  the  final  image.  If  a  spot  is 
not  marked,  it  will  be  missed  and  appear  dark  on  the  final  image.  Many  previous 
implementations  of  this  process  have  had  difficulty  finding  small  spots/pores  with 
their  marker  images  [Colak  and  Qahwaji,  2008;  Zharkov  et  al,  2005] 

In  order  to  prevent  this  potential  loss  of  spots  and  to  avoid  any  morphological 
non-linear  processes  that  may  also  be  necessary  to  reconstruct  the  sunspots  in  our 
image,  the  SDO  code  uses  a  simple  thresholding  method  to  obtain  the  necessary 
black  and  white  product  similar  to  [ Curto  et  al,  2008].  This  method  is  more 
susceptible  to  noise  on  the  image  as  well  as  any  granulation  visible  on  the  surface 
near  the  center  of  the  disk  due  to  the  fact  that  pixels  meeting  the  threshold 
requirement  are  accepted,  although  this  can  be  minimized  by  limiting  the  rate  of 
change  of  accepted  sunspots  discussed  in  the  following  section.  Thresholding, 
instead  of  opening  by  reconstruction,  is  a  simple  way  to  select  the  regions  to  be 
labeled  as  sunspots.  By  systematically  approaching  the  optimal  threshold  value  at 
which  the  spot  regions  will  be  defined,  this  method  avoids  missing  any  starting 
point  on  a  mask  required  in  the  morphological  reconstruction  of  the  image.  In  this 
way,  the  number  of  spots  is  solely  dependent  on  the  intensity  value  of  the  pixels 
used  to  describe  each  region. 

The  method  takes  advantage  of  the  noise  inherent  to  the  system  by  capping 
not  the  number  of  sunspots  detected,  but  the  amount  of  change  in  the  number  of 
sunspots  detected  between  different  threshold  levels.  This  detected  number  of 
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sunspots  is  found  through  the  use  of  a  connected  components  routine  [Gonzalez  and 
Eddins,  2009],  a  function  that  allows  for  clever  representation  of  a  black  and  white 
image.  When  the  black  and  white  image  is  acquired,  a  small  lxl  kernal  is  used  to 
scan  the  image.  When  a  white  pixel  is  found,  it  is  labeled  one  (or  whichever  region 
is  being  marked  at  that  time).  Next,  the  adjacent  pixels  are  checked  for  color;  if  any 
are  white,  they  are  also  labeled  one.  This  process  is  continued  until  all  pixels 
adjacent  to  the  region  are  black  and  there  are  no  other  white  pixels  connected  to 
the  region  [ Gonzalez  and  Eddins ,  2009].  After  the  region  is  defined,  the  function 
continues  scanning  for  the  next  white  pixel  that  hasn’t  been  labeled  (labeling  the 
pixel  with  2,  3  and  so  on),  completing  the  same  process  until  the  entire  image  has 
been  treated.  This  black  and  white  labeling  technique  can  be  seen  in  a  simplified 
example  shown  in  Figure  8.  The  product  of  the  labeling  function  is  a  matrix  with 
each  region,  or  in  this  case  spot,  labeled  with  a  different  number  in  the  same 
location  that  region  appears  on  the  original  image. 


Figure  8.  Images  are  convolved  with  the  lxl  kernal  shown  as  green  square.  The  pixels 
adjacent  to  the  kernal  (shown  in  red)  are  checked  for  matching  pixel  values.  White 
pixels  start  out  with  a  value  of  1  while  black  pixels  are  0.  After  the  labeling  technique, 
black  pixels  still  have  a  value  of  zero  while  all  white  pixels  are  are  given  a  pixel  value 
equal  to  their  region  number. 
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The  threshold  at  which  the  grayscale  image  is  treated  starts  at  a  value  of  0.8 
in  which  few  pixels  are  determined  to  be  dark  enough  to  satisfy  the  threshold 
requirement.  Next,  beginning  with  larger  increments  at  first,  the  threshold  is 
dropped  in  a  sequential  order.  Every  time  the  threshold  value  is  dropped,  the 
function  runs  a  connected  components  routine  that  only  returns  the  number  of 
spots  detected.  This  number  is  subsequently  compared  to  the  initially  calculated 
number  of  spots  to  get  a  rate,  and  if  the  growth  is  below  a  set  expansion  rate,  the 
threshold  is  permitted  to  drop  once  again.  The  threshold  will  continue  to  drop  in 
this  manner  until  the  growth  jumps  up  significantly,  indicating  that  the  method  is 
beginning  to  pick  up  a  large  amount  of  noise.  When  this  happens,  the  threshold 
increment  previously  subtracted  is  added  back  to  the  point  where  no  noise  was 
detected,  and  the  increment  by  which  the  threshold  level  is  dropped  is  reduced  by  a 
factor  of  ten.  In  addition  to  the  increment  by  which  the  threshold  level  is  changed, 
the  amount  by  which  the  number  of  spots  can  change  is  reduced.  Monitoring  the 
change  between  a  sequence  of  threshold  levels  is  important  for  determining  when 
the  noise  threshold  has  been  reached,  but  if  the  value  of  change  to  which  each 
iteration  is  compared  is  fixed,  the  method  really  only  accepts  up  to  a  particular 
number  of  spots  in  the  image.  Therefore,  it  is  necessary  to  alter  the  value  of  the 
change  threshold  so  that  the  number  of  spots  is  permitted  to  grow  to  any  number  as 
long  as  the  spots  are  dark  enough  to  be  distinguished  from  the  noise  in  the  image. 
The  number  of  accepted  change  in  spot  counts  starts  large  (50)  and  decreases  to  a 
small  number  (1)  with  multiple  iterations.  The  process  of  iterative  reduction  is 
repeated  until  the  step  size  by  which  the  threshold  is  reduced  reaches  a  magnitude 
of  10-4  accuracy  at  which  point  the  resulting  threshold  level  is  applied  to  the 
corrected  gray  image  for  the  resulting  black  and  white  image  that  is  subsequently 
labeled  using  the  same  connected  components  routine  [Gonzalez  and  Eddins,  2009]. 
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This  method  performs  exceptionally  well  when  working  to  define  the  extent  of 
the  penumbra  from  the  darkest  areas  in  each  spot  groups.  As  the  threshold  level  is 
dropped,  the  darkest  portions  of  the  image  become  accepted  first.  From  these 
points,  lower  threshold  values,  in  practice,  will  incorporate  points  surrounding  these 
dark  spots,  effectively  growing  the  size  of  each  spot  region.  This  growth  is  permitted 
to  the  point  where  a  lower  threshold  value  begins  to  incorporate  noise,  at  which 
point,  the  method  has  defined  the  penumbra  completely  within  the  image.  Typical 
threshold  values  determined  to  be  acceptable  by  visual  inspection  generally  end  up 
at  approximately  0.35  (also  interpreted  as  35%  the  intensity  of  the  brightest  pixel). 
Combining  these  results  with  knowledge  of  studies  showing  umbra  intensity  values 
typically  prove  to  have  15%  the  intensity  of  the  brightest  part  of  the  sun  [Foukal, 
2008],  the  umbral  regions  can  be  defined  as  the  pixels  that  are  20%  darker  than  the 
penumbral  regions.  To  do  this,  the  corrected  gray  image  is  converted  to  black  and 
white  again,  but  at  a  threshold  level  that  is  0.2  higher  than  the  threshold  level  used 
to  define  the  penumbra.  This  level  does  a  good  job  capturing  all  the  pixels  an 
observer  would  consider  umbra  on  the  image.  The  finalized  black  and  white  umbra 
image  is  also  returned  to  the  function  for  comparison  to  the  penumbra  image. 

3.5  Group  Definition 

With  the  labeled  matrix  containing  each  of  the  penumbra  regions  defined  by 
our  thresholding  routine,  the  next  step  involves  separating  each  of  these  regions  into 
their  respective  groups.  Group  definition  involves  an  iterative  loop  that  tests  each 
region  against  a  set  of  criteria  to  determine  if  that  region  belongs  in  the  group 
currently  being  evaluated  based  on  the  McIntosh  paper  [ McIntosh ,  1990].  This 
process  is  computationally  expensive  due  to  the  multiple  criteria  that  go  into 
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determining  which  spots  belong  in  what  group.  This  section  of  code  therefore  takes 
up  the  largest  amount  of  time. 

It  is  first  important  to  establish  the  criteria  for  determining  whether  a  spot 
belongs  in  a  group.  Because  the  SDO  method  will  be  compared  with  the  product  of 
solar  analysts  at  Holloman  AFB,  it  is  important  to  use  the  same  criteria  in  order  to 
establish  a  baseline  to  compare  the  same  groups  for  the  same  images  [USAF,  2013]. 
Results  will  be  meaningless  if  comparisons  are  made  between  two  different  products. 
All  analysts  follow  the  AFWAI,  but  subjectivity  in  the  classification  process  can 
sometimes  cause  analysts  to  disagree  on  groupings.  Groups  are  defined  using  a 
combination  of  length  between  spots  in  heliographic  degrees  and  the  different 
magnetic  polarities  of  each  spot  being  considered.  For  a  spot  pair  that  has  the  same 
polarity  and  small  angular  separation,  the  pair  will  be  considered  to  be  the  same 
group  if  the  distance  between  the  two  spots  is  less  than  or  equal  to  3  heliographic 
degrees.  All  the  spots  within  3  degrees  of  the  centroid  of  a  group  therefore  will  be 
considered  part  of  that  same  group.  If  there  is  separation  outside  of  3  degrees,  spots 
of  the  same  magnetic  polarity  will  not  be  considered  part  of  the  same  group. 
However,  if  the  polarity  of  the  two  spots  being  considered  for  grouping  is  opposite, 
larger  length  values  are  considered  appropriate.  As  long  as  there  are  spots  along  the 
path  between  the  first  spot  in  the  group  and  the  spot  being  evaluated  that  have 
alternate  polarities,  the  evaluated  spot  will  be  considered  part  of  the  group  beyond 
15  degrees  separation  between  the  centroid  of  the  group  and  the  new  spot  [USAF, 
2013], 

In  order  to  implement  this  process  in  MATLAB,  the  label  matrix  provided 
from  the  connected  components  routine  is  required  [Gonzalez  and  Eddins,  2009]. 
Additionally,  points  taken  from  the  image  need  to  be  transformed  into  heliographic 
coordinates  in  order  to  evaluate  the  true  distance  between  each  spot.  Therefore, 
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before  proceeding,  it  is  important  to  establish  the  mapping  system  used  to  place 
positions  in  pixel  space  into  real  hcliographic  coordinates. 

3.5.1  Distance  Determinations. 

Distance  between  spots  is  one  of  the  two  important  factors  used  for 
determining  spot  groupings  (the  other  being  spot  polarity).  The  mapping  function 

cos (B)  cos (L  —  L0)  =  [cos(.E>o)  cos(p)  ±  sin(I?0)  sin(p)  cos(0  —  P0)]  (10) 

can  be  used  to  determine  longitude  and  latitude  positions  for  each  pixel  determined 
to  be  part  of  a  sunspot  (See  Appendix  2  for  the  complete  derivation).  After 
determining  the  location  in  pixel  space  of  a  particular  spot,  the  center  and  radius 
values  for  the  sun  are  used  in  conjunction  with  B  angle  values  taken  from  the  solar 
ephemeris  to  calculate  spot  positions  in  terms  of  hcliographic  coordinates.  The  spot 
position  is  subsequently  compared  to  candidate  groupings.  Spot  groups  generally 
span  from  right  to  left  as  seen  on  the  disk  because  the  leading  spot  is  generally 
closer  to  the  equator  [ Foukal ,  2008],  but  if  a  circular  radius  from  the  center  of  the 
spot  group  is  used  to  determine  which  other  spots  belong  in  the  group,  spots  may 
be  admitted  that  don’t  necessarily  line  up  with  the  East- West  paradigm.  With  this 
in  mind,  a  variable  distance  function  is  constructed  weight  the  North-South 
direction  more  heavily  so  that  groups  will  tend  to  form  in  a  left  to  right  fashion. 
This  is  done  using  the  following  latitude/longitude  comparison: 
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where  L  is  the  longitude  and  B  is  the  latitude.  In  addition  to  defining  the  distance 
in  terms  of  heliographic  coordinates,  it  is  also  important  to  establish  an  effective 
way  for  determining  the  magnetic  polarity  of  each  detected  spot  to  compare  with 
every  other  spot  [Foukal,  2008].  To  do  this,  a  function  is  run  that  creates  a  logical 
matrix  containing  ones  at  the  position  of  the  spot  group  being  analyzed,  and  this 
matrix  is  multiplied  into  the  HMIM  image  to  zero  out  all  pixels  that  are  not  part  of 
the  current  region  on  the  HMIM  image.  Next,  the  pixels  are  summed  and  the  total 
is  divided  by  the  number  of  pixels  in  the  region  to  get  an  average  pixel  value  of  the 
group.  If  the  average  is  below  127.5,  meaning  half  of  the  maximum  pixel  value,  the 
region  is  labeled  as  having  polarity  1.  Otherwise,  the  region  is  given  a  polarity  label 
of  2.  It  should  be  mentioned  that  there  is  a  possible  scenario  where  a  spot  group 
has  a  5  configuration  on  the  Mt  Wilson  scale  (two  separate  polarities  within  the 
same  penumbra).  This  would  mean  that  part  of  the  spot  group  would  have  a  pixel 
value  of  less  than  the  127.5  cutoff  while  the  other  section  would  be  above  the  cutoff. 
In  this  case,  the  function  would  simply  label  the  spot  with  the  predominant  polarity 
in  the  region.  This  may  make  it  difficult  to  expand  the  grouping  method  if  a  Mt 
Wilson  classification  scheme  is  desired  later  on,  but  for  this  research,  it  is 
unnecessary  .  In  the  case  where  a  spot  waiting  to  be  grouped  has  magnetic  polarity 
above  the  cutoff  and  the  trailing  spots  in  the  group  is  also  above  the  cutoff,  there 
may  be  an  issue  with  accepting  the  second  spot  if  no  other  spots  in  the  region  have 
a  low  dominant  pixel  value  indicating  the  opposite  polarity,  but  this  situation  may 
rarely  be  the  case  and  has  never  been  observed.  With  this  new  tool  set,  each  spot 
can  now  be  compared  effectively  in  a  manner  that  appropriately  groups  the  proper 
spots  on  the  sun. 

Grouping  of  detected  sunspots  is  a  problem  that  has  been  addressed  multiple 
different  ways  and  is  the  subject  of  ongoing  research,  summarized  by  [ Aschwanden , 
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2010]  and  [  Ver beech  et  al,  2013].  In  order  to  prevent  subjectivity  form  affecting  the 
SDO  system,  this  algorithm  for  grouping  is  defined  in  a  manner  that  is  precise  and 
specific  with  regard  to  the  physical  characteristics  of  each  spot.  In  other  words,  the 
algorithm  is  set  up  accepting  some  difference  between  reported  spot  groupings  in 
favor  of  a  method  that  is  strictly  defined  in  a  way  that  can  always  be  reproduced 
given  the  same  variables.  With  this  thought  in  mind,  it  should  be  noted  that  the 
algorithm  does  not  attempt  to  diverge  from  the  standard  grouping  technique,  but  is 
constructed  in  favor  of  a  consistently  biased  result.  The  results  here  are  supposed  to 
be  similar  to  those  produced  at  all  solar  observatories  as  the  grouping  rules  are  still 
based  on  the  McIntosh  grouping  techniques  [ McIntosh ,  1990].  In  many  ways,  this 
new  grouping  technique  attempts  to  mimic  the  process  of  thinking  that  goes 
through  an  analyst’s  mind  before  choosing  the  grouping  for  a  spread  of  sunspots 
[USAF,  2013].  The  difference  comes  in  that  specific  levels  are  laid  out  in  a  way  that 
prevents  overlap  and  when  differences  are  found  in  grouping,  the  code  user  should 
be  able  to  see  why  those  differences  arose. 

3.5.2  Grouping  Length. 

Grouping  length  is  different  from  a  sunspot  group  length,  the  former  being  the 
parameter  used  by  the  SDO  grouping  technique  to  test  candidate  spots  for 
acceptance  into  a  specific  group.  After  attempting  to  optimize  the  grouping  length, 
the  conclusion  was  reached  that  certain  types  of  groups  are  grouped  properly  for  a 
corresponding  grouping  length.  This  means  that  one  group  may  benefit  by  having  a 
grouping  length  of  10  while  another  group  may  be  correctly  grouped  for  a  grouping 
length  of  15.  For  groups  that  were  large  and  had  extensive  and  complex  structures 
with  many  umbra  in  the  group,  a  larger  grouping  length  performed  better.  However, 
using  this  same  grouping  length  for  smaller  sunspots  tended  to  lump  spots  together 
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that  did  not  belong  in  the  same  group.  Conversely,  small  spot  groups  would  favor  a 
shorter  grouping  length,  but  their  large  group  length  counterparts  would  be 
over-partitioned  by  the  grouping  algorithm,  leaving  spots  outside  the  group  that 
actually  belonged  inside.  In  this  way,  it  is  impossible  to  optimize  the  grouping 
length  for  all  group  sizes.  It  was  realized  then  that  rather  than  choose  a  single 
grouping  length  and  accept  error  for  certain  group  types,  the  grouping  length  could 
be  altered  during  the  segmentation  process.  The  process  of  grouping,  therefore,  is 
based  on  the  area  of  each  spot  group  to  in  order  to  give  a  range  of  grouping  lengths 
for  different  group  types.  An  analyst’s  eyes  are  drawn  to  the  areas  of  the  solar  disk 
that  have  the  greatest  population  of  spots  or  that  are  the  darkest,  consuming  large 
sections  of  the  solar  hemisphere.  Interpreting  this  observation  tendency  into  a 
coding  scheme,  the  SDO  code  alters  the  grouping  length  depending  on  the  area  of 
the  spots  currently  assigned  to  that  group.  For  spots  that  have  a  large  area,  a  larger 
grouping  length  is  assigned,  while  conversely,  small  area  groups  have  a  smaller 
grouping  length.  This  effect  takes  the  best  of  both  worlds  and  nicely  compromises 
the  two  extremes.  To  obtain  the  grouping  length  function,  three  approaches  were 
experimentally  determined.  Collecting  data  over  the  past  year  from  San  Vito, 
Learmonth  and  Holloman  as  well  as  10  years  of  SWPC  data,  a  set  of  points  is 
obtained  matching  group  lengths  to  the  total  area  of  that  spot  group.  Trial  1  took 
the  maximum  group  length  for  a  particular  area  to  create  our  data  set,  to  which  an 
exponential  curve  is  fit  resulting  in  an  equation  relating  area  to  group  length. 

Area 02276 

Length  =  8.276^-  -  2.875  (12) 

Trial  2  takes  the  average  value  of  group  length  for  a  particular  area  to  create  a  set 
of  data  points  to  which  an  exponential  curve  is  also  fit.  The  result  is  another 
equation  that  relates  area  to  group  length 
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-0.08423 


(13) 


Length 


-24.74 


Area 

10 


+  32.96 


The  final  trial  is  simply  a  linear  interpolation  of  points  selected  between  the  two 
curves,  resulting  in  the  most  accurate  results  after  testing.  Each  point  is 
approximately  equal  to  the  average  of  both  previous  curves.  These  curves  are 
summarized  in  Figure  9. 


Figure  9.  All  sunspot  groups  reported  by  SWPC,  Holloman,  Learmonth  and  San  Vito 
are  shown  in  terms  of  color  density,  plotted  with  the  area  of  the  spot  group  on  the  x  axis 
and  the  total  length  of  the  spot  group  on  the  y  axis.  The  three  AF  Solar  Observatories 
make  up  a  years  worth  of  data  each,  while  SWPC  contributes  10  years  of  spot  data. 
The  first  exponential  curve  is  the  result  of  fitting  the  maximum  group  length  for  each 
area.  The  second  exponential  curve  is  the  result  of  fitting  the  average  group  length  for 
each  particular  area.  The  final  curve  is  approximately  the  average  value  between  the 
first  two  curves,  cut  off  at  a  group  length  of  20. 
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To  calculate  the  area  of  a  spot,  the  code  applies  the  same  general  steps  used 
by  Holloman  AFB.  Because  the  locations  of  each  sunspot  are  known  in  pixel  space 
and  each  sunspot  is  represented  in  a  black  and  white  image,  we  sum  up  all  of  the 
pixels  that  are  representing  each  sunspot.  This  yields  an  area  in  pixel  space  that  is 
then  converted  to  a  percentage  of  the  total  area  of  the  sun.  By  dividing  A,  the  area 
in  pixels 


Area  = 


A 

2nr2 


(14) 


by  the  area  of  a  solar  hemisphere  in  pixel  space  where  r  is  the  solar  radius  in  pixels. 
The  resulting  number  is  then  multiplied  by  the  geometric  foreshortening  correction 
factor  that  is  found  by  determining  the  angle  p  off  from  center  that  describes  the 
sunspot’s  position  which  can  be  found  by  taking  the  inverse  cosine  of  equation  22. 
The  foreshortening  correction  factor  is  found  by 


fc 


1  x  106 
P 


(15) 


where  it  is  then  multiplied  by  1  x  106  to  put  the  area  in  units  of  millionths  of  a  solar 
hemisphere  [USAF,  2013]. 


3.5.3  Grouping  Process. 

In  order  to  perform  the  grouping,  the  variable  grouping  length  method  is  used 
based  on  the  total  area  of  sunspots  already  assigned  to  a  group.  Starting  with  the 
biggest  spot  visible  on  the  disk,  new  spot  candidates  are  tested  and  accepted  into 
the  group  based  on  the  grouping  length  defined  in  Section  3.5.2.  To  do  this,  a  case 
statement  must  be  set  up  inside  a  loop.  The  purpose  of  this  initial  loop  is  to  iterate 
through  every  numbered  spot  contained  in  the  label  matrix  from  the  thresholding 
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routine  in  Section  3.4.  The  case  statement  is  needed  to  check  the  existence  of  the 


second  spot  being  compared  to  the  first  spot.  In  the  case  where  the  spot  being 
compared  has  already  been  assigned  to  another  group,  this  statement  skips  to  the 
next  spot  to  compare.  In  the  case  where  the  spot  does  exist  and  has  not  been  given 
a  group  number  yet,  a  second  loop  is  started.  The  purpose  of  this  second  loop  is  to 
establish  whether  or  not  there  are  spots  within  the  grouping  distance  defined  by  the 
area  of  the  current  spot  group.  These  other  spots  must  have  the  opposite  polarity  of 
the  current  spot  being  compared.  In  addition,  if  a  spot  being  tested  lies  within  5 
heliographic  degrees,  it  is  labeled  as  part  of  the  current  group.  If  a  spot  with 
opposite  polarity  does  exist  within  5  heliographic  degrees,  the  polarity  criteria  is 
met,  and  a  third  loop  is  triggered.  This  third  loop  also  moves  through  each  spot  not 
yet  labeled,  but  instead  of  the  5  degree  separation  as  the  criteria  for  group  labeling, 
there  is  an  extended  limit  based  on  the  area  of  the  spot  group  using  the  average  of 
equations  12  and  13.  If  the  bipolar  criteria  is  triggered,  the  current  group  is  marked 
as  a  bipolar  group  at  the  end  of  this  loop.  The  final  product  of  this  grouping 
algorithm  is  a  label  matrix  that  contains  the  number  of  groups  in  the  image 
according  to  the  Holloman  AFB  grouping  criteria  [USAF,  2013]. 

3.6  Feature  Extraction 

The  final  stage  of  processing  before  the  classification  scheme  involves 
extracting  information  out  of  the  image  using  a  data  representation  function 
[Gonzalez  and  Eddins,  2009].  The  end  goal  for  this  section  of  code  is  to  come  up 
with  a  way  to  represent  the  McIntosh  classification  in  terms  of  extractable  features 
from  the  label  matrix  that  has  already  been  created.  In  MATLAB,  key  features  are 
removed  via  the  function  regionprops  [Gonzalez  and  Eddins,  2009]  which  takes  a 
label  matrix  along  with  any  number  of  specified  parameter  values  to  calculate 
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certain  items  that  are  useful  for  classifications.  These  key  features  include  the  length 
of  the  sunspot  group,  length  of  largest  sunspot  penumbra,  eccentricity  of  the  largest 
sunspot,  area  of  the  sunspot  group,  polarity  of  each  sunspot,  the  completeness  of 
the  penumbra,  and  the  spread  of  sunspots  within  the  group.  When  the  McIntosh 
system  explicitly  defined  the  classification  parameters,  those  values  were  used. 

When  wording  was  unspecific,  the  words  were  interpreted  in  the  most  logical  way. 
These  interpretations  are  stated  explicitly  throughout  Sections  3.6. 1-3. 6. 3. 

It  is  difficult  to  compare  the  intensity  values  between  different  sunspots  on  a 
white  light  board  without  significant  experience.  Additionally,  it  is  difficult  for  an 
analyst  to  discern  the  relative  darkness  of  the  penumbra  compared  to  the  umbra  of 
the  spot.  All  of  this  information  is  not  necessary  when  drawing  on  a  piece  of  paper, 
but  is  essential  when  determining  the  extent  of  the  penumbra  and  umbra  on  a 
digital  image.  When  small  spots  are  detected  on  a  white  light  board,  there  may  not 
be  enough  spatial  resolution  to  either  determine  the  darkness  of  each  spot,  or  the 
potential  presence  of  penumbra.  In  a  digital  image  created  by  a  CCD,  however,  the 
computer  needs  to  interpret  each  pixel  as  a  digital  count  value  [ Gonzalez  and 
Eddins,  2009].  The  formation  of  sunspots  described  by  the  Babcock  model  is  driven 
by  suppression  of  plasma  flow  in  the  photosphere  [ Foukal ,  2008].  Because 
suppressed  plasma  can  no  longer  circle  back  down  to  the  radiative  zone  to  re-heat, 
the  intensity  of  light  in  the  visible  spectrum  decreases  and  the  regions  of  higher 
magnetic  held  appear  darker  than  those  regions  of  the  sun  that  are  quiet  with  little 
magnetic  activity.  Below  a  certain  intensity  value  on  each  image,  these  regions  of 
suppressed  convection  will  be  counted  as  sunspots  in  the  code,  as  discussed  in 
Section  3.4.  In  practice,  when  small  sunspots  are  detected  on  the  digital  image,  they 
will  pass  the  penumbra  threshold  but  may  rarely  pass  the  umbra  threshold.  These 
small  spots  must  still  be  considered  umbra,  though,  even  if  they  do  not  appear  on 
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the  umbra  image.  Therefore,  a  routine  is  developed  to  establish  whether  a  sunspot 
on  the  penumbra  image  has  a  co-located  umbra  spot.  If  this  is  not  the  case,  that 
spot  on  the  penumbra  image  is  considered  part  of  the  umbra  image,  even  though  its 
intensity  level  was  higher. 

3.6.1  Zurich  Classification. 

With  this  spot  definition  in  mind,  the  task  in  classification  becomes 
representing  each  detected  spot  group  in  the  McIntosh  fashion.  Under  the  Zurich 
Classification,  or  first  letter  designation,  the  length  of  the  sunspot  group  and 
completeness  of  penumbra  surrounding  the  leading  and  trailing  spots  are  primary 
factors.  These  items  are  shown  in  Figure  2.  To  start,  a  new  spot  is  defined  as  one 
that  possesses  umbra  with  little  to  no  penumbra.  These  types  of  spots  are  given  the 
designation  ‘A’  as  they  represent  the  first  stage  in  a  sunspot’s  life  at  the  lowest 
detectable  intensity  level.  If  a  spot  group  is  unipolar  and  has  a  defined  penumbra,  it 
will  be  given  an  ‘H’  designation.  In  practice,  there  are  usually  pixels  around  the 
central  dark  spot  that  satisfy  the  penumbra  threshold  level,  but  if  the  spot  is  small, 
these  are  neglected  because  their  intensity  may  have  to  do  with  the  binning  used  in 
the  CCD.  Because  of  this,  area  restrictions  are  applied  in  order  to  prevent  small 
spots  with  very  small  areas  from  receiving  ‘H’  classifications.  Because  ‘H’ 
classifications  are  supposed  to  be  for  higher  levels  of  a  sunspot’s  life,  it  is  not  logical 
to  allow  this  classification  for  groups  that  could  be  comprised  of  3-5  pixels. 
Therefore,  the  definition  of  the  ‘A’  designation  is  altered  to  include  unipolar  groups 
with  umbra  that  have  an  area  of  less  than  5  millionths  of  a  solar  hemisphere  in 
addition  to  the  other  criteria. 

As  the  spot  group  grows,  it  will  likely  develop  a  trailing  spot  with  penumbra 
intensity  levels,  traditionally  considered  a  spot  group  with  the  Zurich  designator  ‘B’. 
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For  cases  with  darker  spots,  the  presence  of  penumbra  is  the  indicator  for  full 
development  of  a  spot  [ Foukal ,  2008;  McIntosh ,  1990].  The  validity  of  any 
penumbra/umbra  combination  seen  on  the  image  is  checked  by  multiplying  the 
binary  image  of  the  penumbra  with  the  binary  image  of  the  umbra.  Given  that  both 
regions  will  overlap,  the  resulting  image  will  either  have  a  sum  of  zero  for  spots  that 
no  corresponding  mark  on  the  umbra  image,  or  non-zero  for  spot  groups  that  do 
have  the  corresponding  mark.  In  order  to  define  which  spot  groups  have  a  mature 
penumbra,  it  was  interpreted  that  the  overlap  could  not  be  greater  than  85%  of  the 
total  area  of  the  penumbra  as  this  would  imply  that  the  majority  of  the  spot  group 
is  dark  umbra.  If  this  experimentally  determined  threshold  is  passed,  this  penumbra 
group  is  labeled  as  a  mature  spot.  Subsequently,  spot  groups  with  mature  spots  on 
one  end  of  the  group  but  not  the  other  end  will  be  considered  of  the  ‘C’  designation, 
while  spot  groups  with  mature  spots  on  both  ends  are  ‘D’.  This  determination  is 
done  by  calculating  the  maximum  extent  of  the  spot  group  in  all  directions  to  find 
the  length  of  the  spot  group.  Then,  iterating  through  all  the  mature  spots  in  the 
group,  the  distance  in  heliographic  degrees  between  each  mature  spot  is  determined. 
If  any  of  the  determined  distances  are  greater  than  70%  of  the  total  spot  group 
length,  this  is  interpreted  as  satisfying  the  condition  that  mature  spots  be  on  both 
ends  of  the  group.  This  cutoff  was  determined  during  the  development  process  to 
function  appropriately  in  classification  of  each  sunspot  group.  For  the  higher  classes 
of  ‘E’  or  ‘F’,  the  length  of  the  group  is  looked  at  in  addition  to  the  criteria  that  is 
necessary  to  designate  a  group  as  ‘D’.  The  necessary  values  for  each  classification 
are  defined  in  Section  2.5.1. 

The  possibility  of  detecting  a  spot  group  that  is  comprised  of  a  few  spots  in 
close  proximity  (meaning  a  few  pixels)  to  one  another  is  considered,  where  one  large 
spot  is  centered  among  a  few  small  spots.  This  group  would  be  considered  one 


49 


group  on  a  drawing  board,  but  because  of  the  additional  spatial  resolution,  parts  of 
the  spot  may  be  separated.  In  the  case  where  all  spots  detected  have  the  same 
polarity,  there  is  no  issue  and  the  group  will  be  labeled  ‘H’  in  line  with  what  an 
observer  would  say.  However,  in  the  case  that  there  is  fringing  and  part  of  the 
penumbra  has  an  opposite  polarity  to  the  rest  of  the  spot,  the  group  will  be  labeled 
bipolar  therefore  given  a  ‘C’  designation.  To  correct  for  this  possibility,  a  case 
statement  is  included  where  spot  groups  of  this  type  are  still  given  the  designation 
‘H’  but  will  still  be  considered  bipolar.  Specifically,  if  the  length  of  the  largest  spot 
in  the  group  makes  up  greater  than  85%  of  the  total  length  of  the  group,  in  both 
East  to  West  and  North  to  South  directions,  this  switch  is  made. 

3.6.2  Penumbra  Classification. 

The  penumbra  classification,  illustrated  by  series  2  in  Figure  2,  begins  by 
checking  the  Zurich  Classification  applied  to  the  spot  group  by  the  previous  section 
of  code.  If  the  first  letter  is  given  as  either  ‘A’  or  ‘B’,  there  can  be  no  penumbra 
visible  in  that  particular  spot  group  by  definition.  In  this  case,  these  groups  are 
assigned  a  penumbra  class  of  ‘x’  for  non-existent  penumbra,  following  the  rules  in 
Table  2. 

The  next  possible  outcome  is  one  where  the  penumbra  of  the  group  is  fairly 
limited  in  extent.  Each  classification  beyond  the  non-existent  penumbra  notation 
looks  at  the  penumbra  surrounding  the  largest  spot  in  the  group.  Because  of  this,  a 
connected  components  routine  [Gonzalez  and  Eddins,  2009],  illustrated  in  Figure  8, 
is  run  on  the  individual  spot  group  to  get  the  total  number  of  spots  and  to  separate 
those  spots  into  their  respective  sizes.  The  area  correction  is  applied  to  each  spot  in 
the  group  before  re-ordering  the  spots  from  largest  to  smallest.  After  this  point,  the 
largest  spot  is  addressed  and  the  umbra  contained  within  that  spot’s  penumbra  is 
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summed  and  divided  by  the  area  of  the  surrounding  penumbra.  If  the  ratio  of  these 
two  numbers  exceeds  0.5,  this  penumbra  is  considered  to  be  rudimentary  and  the 
label  ‘r’  is  given.  This  indicates  that  the  umbra  of  the  spot  group  makes  up  more 
than  half  of  the  total  area  of  the  spot.  The  penumbra  of  this  spot  group  therefore 
does  not  comprise  a  significant  portion  of  the  area  and  is  likely  underdeveloped. 

The  next  case  to  consider  is  for  fully  developed  penumbra.  Four  separate 
classes  can  be  assigned  depending  on  the  combination  of  eccentricity  and  length: 
the  spot  is  asymmetric  and  has  a  penumbra  length  less  than  2.5  degrees  (labeled 
‘a’),  the  spot  is  symmetric  and  has  a  penumbra  length  less  than  2.5  degrees  (labeled 
‘s’),  the  spot  is  asymmetric  and  has  a  penumbra  length  of  greater  than  2.5  degrees 
(labeled  ‘k’),  or  the  spot  is  symmetric  with  a  penumbra  length  of  greater  than  2.5 
degrees  (labeled  ‘h’).  The  length  of  the  group  is  determined  using  the  same 
technique  of  measuring  the  extreme  points  of  the  largest  spot  and  calculating  their 
position  in  heliographic  degrees  before  comparing  the  extent  to  find  the  span.  To 
find  the  eccentricity,  the  function  regionprops  [ Gonzalez  and  Eddins,  2009]  is  called 
with  the  eccentricity  option.  This  function  determines  the  ratio  of  the  semi-major 
axis  to  the  semi-minor  axis,  yielding  a  value  between  0  and  1.  If  the  object  is  as 
wide  as  it  is  tall,  the  function  will  return  an  eccentricity  of  zero,  indicating  a  circle. 
A  line  segment  will  receive  an  eccentricity  of  1.  The  value  corresponding  to  the 
largest  spot  in  the  group  determines  the  eccentricity  criteria.  An  eccentricity  of  less 
than  0.5  is  interpreted  to  be  symmetric  while  an  eccentricity  of  greater  than  0.5  is 
interpreted  as  asymmetric.  Each  of  these  pieces  of  information  are  compiled  to  fill 
in  the  rest  of  the  possible  scenarios. 
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3.6.3  Compactness  Classification. 


The  compactness  classification  is  given  based  on  the  concentration  and 
number  of  additional  spots  seen  in  the  group  being  analyzed,  illustrated  by  the 
third  series  in  Figure  2.  For  the  case  where  the  group  is  unipolar  and  the  first  letter 
of  the  classification  is  either  ‘A’  or  ‘IF,  there  can  be  no  spread  of  spots  between 
leader  and  trailer  spot  by  definition.  In  this  case,  the  compactness  classification  for 
these  unipolar  groups  is  given  as  ‘x’  or  non-existent. 

For  the  cases  where  the  group  is  bipolar,  three  separate  classifications  can  be 
assigned  based  on  the  observer’s  interpretation  of  three  similar  words:  few,  several, 
and  many  [ McIntosh ,  1990;  USAF ,  2013].  This  wording  ambiguity  is  shown  in 
Table  3.  In  this  research,  many  is  defined  as  greater  than  5  while  3  is  several.  For 
the  SDO  code,  the  compact  and  intermediate  classifications  are  addressed  first  and 
any  spot  groups  that  don’t  qualify  are  assigned  an  open  designation  ‘o’,  defined  in 
Section  2.5.3.  In  order  to  receive  a  compact  classification,  the  spot  group’s  trailing 
spot  must  have  mature  penumbra  in  addition  to  mature  penumbra  present  on  at 
least  one  spot  in  between  leader  and  trailer.  This  means  that  no  classification  below 
‘D’  can  be  compact  by  definition  combined  with  the  requirement  that  the  group 
must  has  greater  than  5  umbras.  In  practice,  the  position  of  each  umbra  in  each 
spot  group  is  calculated  with  respect  to  the  leading  spot.  If  umbra  is  present  on  the 
trailing  spot  in  addition  to  somewhere  in  between  the  leader  and  trailer,  the  group 
passes  the  first  condition.  The  counted  number  of  umbra  defines  the  second 
condition.  As  an  additional  requirement  due  to  the  fact  that  small  narrow  groups 
are  not  given  ‘c’  classifications,  an  area  to  length  requirement  is  set.  The  group’s 
area  divided  by  the  group  length  must  also  be  over  a  set  value  of  30  to  ensure  that 
no  groups  with  small  area  can  be  included  in  this  compact  category  as  a  small, 
narrow  group  would  otherwise  be  considered  satisfactory. 
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It  was  selected  that  groups  with  greater  than  3  individual  spots  with  one  spot 
within  the  group  possessing  mature  penumbra  should  be  given  the  intermediate 
classification  ‘i’.  In  the  case  where  a  group  of  the  Zurich  Classification  ‘C’  meets  all 
the  requirements  for  a  compactness  class  of  ‘c’,  a  case  statement  re-labels  that 
group  to  an  ‘i’  due  to  that  group  not  meeting  the  requirement  of  developed 
penumbra  on  both  ends  of  the  group.  For  spot  groups  that  do  not  meet  any  of  the 
requirements  for  the  compact  or  intermediate  classifications,  the  open  classification 
‘o’  is  given  as  a  catch  all  statement. 

3.7  Evaluation  of  Code  Product 

This  research  can  be  viewed  in  two  parts.  The  first  part  involves  the  creation 
of  an  automatic  code  capable  of  analyzing  sunspots  visible  in  the  photosphere  at  a 
very  high  rate.  The  second  part  involves  an  analysis  of  the  accuracy  of  sunspot 
detection  and  classification  data  produced  by  the  first  part.  The  method  of  research 
can  then  be  said  to  require  two  sets  of  analysis,  the  first  being  in  response  to  the 
code  itself  while  the  second  should  address  the  product  of  the  code.  In  this  thesis, 
only  the  latter  is  discussed.  This  is  because  the  former  focuses  heavily  on  image 
processing  techniques  that  have  been  well  established.  There  are  different  ways  to 
go  about  producing  the  results  obtained  by  the  SDO  code,  but  these  are  less 
important  than  the  actual  resulting  classifications  produced. 

To  fully  analyze  the  product  of  the  SDO  code,  data  is  produced  for  two  time 
series.  The  first  is  a  testing  period  where  a  variety  of  images  were  used  to  construct 
the  code  in  the  manner  outlined  in  this  chapter.  The  second  period  was  used  to 
evaluate  the  performance  of  the  finalized  code  developed  using  the  data  from  the 
first  period.  Each  of  the  data  series  used  spans  approximately  6  months  of  time. 

The  testing  period  data  was  from  6  July,  2012  through  31  December,  2012.  The 
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evaluation  data  spanned  from  1  January,  2013  to  29  July,  2013.  Analysis  of  the  first 
time  series  is  accomplished  in  Chapter  IV  while  the  second  time  series  evaluation  is 
shown  in  Appendix  3. 
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IV.  Analysis  and  Results 


To  obtain  a  full  picture  of  the  accuracy  of  the  SDO  product,  a  comparison  is 
done  with  respect  to  two  sources  in  three  directions  (SWPC  to  Holloman,  SDO  to 
SWPC,  and  SDO  to  Holloman).  The  first  comparison  between  a  set  of  standard 
observers  defines  the  acceptable  difference  range,  while  the  second  two  comparisons 
demonstrate  that  results  are  within  that  established  range.  These  comparisons  are 
accomplished  using  the  first  6  month  testing  period  data  while  the  same  comparison 
is  performed  for  the  second  6  months  in  Appendix  3.  After  first  showing  that 
numbers  for  sunspot  area,  number  of  groups  and  number  of  sunspots  detected  by 
the  SDO  code  are  not  anomalous,  a  metric  for  sunspot  classification  success  is 
described.  A  final  comparison  subsequently  shows  bias  in  the  hand  drawn  spot 
recording  and  classification  system  by  altering  specific  parameters  in  the  sunspot 
classification  code. 

4.1  Code  Output 

This  SDO  image  analysis  code  outputs  data  to  include  the  date  and  time, 
longitude  and  latitude,  the  area  of  the  spot  group,  the  number  of  umbras  in  each 
group,  and  the  assigned  McIntosh  Classification  for  that  spot  group.  Comparisons 
for  the  purpose  of  this  research  are  performed  with  both  Holloman  AFB  and 
SWPC.  Holloman  AFB  was  chosen  to  be  the  compared  standard  observatory 
because  its  report  times  are  closest  to  SWPC,  the  national  center  for  space  weather. 
In  this  way,  any  differences  in  reports  should  be  mostly  due  to  a  difference  in 
interpretation  of  the  classification  rules  as  timing  differences  are  minimized. 
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4.2  Area,  Group,  and  Spot  Accuracy 


The  first  step  in  showing  the  accuracy  of  the  SDO  method  of  automated 
sunspot  detection  and  classification  is  to  verify  quantities  produced  by  the  code  in 
comparison  to  each  of  the  other  entities.  These  quantities  include  the  total  area  of 
sunspot  groups  detected,  the  number  of  groups,  and  the  number  of  spots  detected 
on  any  given  day.  Each  of  these  quantities  will  have  a  certain  amount  of  bias 
included  because  of  a  multitude  of  different  factors,  so  each  item  will  be  addressed 
in  detail  to  show  satisfactory  values  are  achieved  before  progressing  to  demonstrate 
the  accuracy  of  the  classification  method.  The  end  goal,  however,  was  to  show  that 
the  SDO  product  provides  results  comparable  to  those  produced  by  both  SWPC 
and  Holloman. 

4.2.1  Area  Comparison. 

The  first  item  compared  is  the  total  area  of  sunspots  detected.  This  is  done  on 
a  day  by  day  basis  where  the  total  area  is  calculated  by  summing  the  area  of  each 
sunspot  group  reported  on  a  given  day  from  each  reporting  entity.  Several  caveats 
should  be  mentioned  regarding  the  calculation  of  this  number.  First,  the  range  of 
detection  of  the  SDO  code  inhibits  the  detection  of  sunspot  groups  beyond  roughly 
±75°  longitude,  varying  with  the  change  of  the  B  angle,  uniformly  applying  to  all 
other  compared  quantities.  Higher  B  angles  prevent  the  detection  of  sunspots  at 
smaller  values  of  longitude  in  the  southern  hemisphere  of  the  sun  while  the  opposite 
is  true  for  lower  B  angles.  This  issue  is  compromised  by  cutting  out  all  sunspots 
recorded  by  SWPC  and  Holloman  from  the  data  set  that  sit  outside  the  ±75° 
cutoff.  As  a  result,  these  sunspot  groups  don’t  contribute  to  any  of  the  area  totals 
for  SWPC  or  Holloman.  However,  there  is  a  possibility  that  the  SDO  code  will 
catch  portions  of  a  sunspot  group  that  exists  on  the  edge  of  the  cutoff  region  and 
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therefore  count  part  of  the  sunspot  group  under  the  SDO  area  while  the  same  spot 
group  may  have  been  cut  out  completely  from  SWPC  or  Holloman  reports. 
Conversely,  part  of  the  area  of  the  spot  may  be  cut  off  by  the  ±75°  cutoff  meaning 
that  the  total  area  detected  by  the  SDO  will  be  less  than  the  area  seen  by  SWPC  or 
Holloman.  In  addition  to  these  discrepancies,  the  position  of  each  sunspot  group 
reported  is  determined  using  the  latitude  and  longitude  layover  (discussed  in 
Appendix  1)  and  the  accuracy  of  this  method  is  questionable  near  the  edge  of  the 
solar  disk  due  to  the  effect  of  geometric  foreshortening  and  the  interpolation 
analysts  need  to  perform  to  obtain  the  region’s  position.  In  a  similar  display  of 
error,  the  longitude  for  SWPC  spot  reports  needs  to  be  projected  backwards 
approximately  8  hours  in  order  to  match  the  position  of  both  Holloman  and  the 
SDO  images.  Projection  of  sunspot  longitude  from  SWPC  reports  is  executed 
uniformly  by  subtracting  4  degrees  from  the  reported  longitude,  even  though  the 
actual  correction  may  need  to  be  slightly  higher  or  lower.  If  a  spot’s  position  is 
miss-reported,  that  spot  may  not  be  cut  out  of  the  data  set  when  it  would  be 
appropriate  to  do  so.  Fourthly,  the  SDO  code  calculates  the  area  in  a  similar  but 
more  spatially  precise  manner  than  Holloman  or  SWPC  as  described  in  Section  2.10 
and  Section  3.5.  Similar  to  decreasing  the  bin  size  on  a  Reimann  sum  to  calculate 
the  error  under  the  curve,  summing  pixels  in  a  sunspot  group  yields  a  more  accurate 
area  than  the  fitted  ellipse.  In  addition,  the  foreshortening  correction  is  calculated 
to  a  more  precise  degree.  While  the  Holloman  method  of  area  calculation  may  still 
be  considered  accurate  in  order  of  magnitude,  the  precision  of  the  method  is 
reduced  to  increments  of  10  millionths  of  a  solar  hemisphere  while  the  SDO  code 
calculates  these  values  to  a  much  smaller  decimal.  There  is  something  to  be  said  for 
the  quantization  of  the  area  value  into  individual  pixels,  but  as  was  stated  in 
Section  2.10,  the  spatial  resolution  of  these  pixels  is  still  significantly  better  than 
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that  of  a  pencil  tip  compared  to  quantization  effects.  Finally,  the  factor  determining 
which  pixels  qualify  as  sunspots  has  been  iteratively  determined.  The  pixels 
assigned  to  be  considered  part  of  each  sunspot  satisfy  the  requirement  of  being 
darker  than  their  surrounding  pixels  by  a  determinable  amount.  With  these  caveats 
in  mind,  the  analysis  proceeds  with  the  relationship  between  the  SDO  code  total 
area  product  and  the  reports  collected  from  both  Holloman  AFB  and  SWPC. 

Figure  10  summarizes  the  differences  between  the  calculations  of  total  area 
from  each  of  the  reporting  entities  from  6  July  2012.  The  first  thing  to  notice  is  that 
there  exists  no  significant  trend  of  difference  between  any  of  the  three  plotted 
relationships.  There  are  single  days  scattered  throughout  the  6  month  period  where 
the  difference  in  area  spikes  up  to  a  larger  value,  but  these  incidents  are  few  and 
correspond  to  days  with  large  sunspots  visible  on  the  solar  disk  where  the  extent  of 
the  penumbra  in  each  case  may  be  interpreted  differently  by  separate  observers. 
Additionally,  area  layover  accuracy  for  spots  with  an  area  over  500  millionths  of  a 
solar  hemisphere  is  questionable  (due  to  the  size  of  the  fitted  ellipse),  and  is 
therefore  a  likely  source  of  error.  Overall,  the  difference  plots  illustrate  that  the 
areas  produced  by  the  SDO  code  fall  concurrent  with  area  measurements  from  the 
Holloman  Observatory  as  well  as  SWPC  reports.  After  the  difference  plot,  each 
day’s  area  total  is  plotted  against  the  area  total  from  another  reporting  entity  in 
Figures  11-13.  Under  a  perfect  match,  each  point  would  lie  along  a  line  with  a  slope 
of  one  through  the  origin.  Paying  close  attention  to  Figure  11  where  the  areas  of 
Holloman  and  SWPC  are  compared,  in  general,  the  spread  of  points  increases  with 
increasing  area.  The  R  squared  value  for  the  linear  regression  line  through  each 
point  is  0.778.  In  both  of  the  subsequent  plots,  the  regression  displays  similar 
results  to  the  baseline,  returning  an  R  squared  value  of  0.780  for  the  SDO  to  SWPC 
comparison  and  0.749  for  the  Holloman  to  SDO  comparison,  seen  in  Figures  12  and 
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13  respectively.  Looking  next  at  the  equations  describing  the  trend  line,  some 
general  conclusions  can  be  drawn  about  the  nature  of  area  calculations  in  each  of 
the  three  reporting  entities  with  respect  to  the  others.  First,  Holloman  tends  to 
overestimate  the  total  area  of  sunspots  on  any  given  day  for  larger  spot  groups  due 
to  the  fact  that  in  both  Figure  11  and  Figure  13,  the  slope  of  the  regression  line  is 
less  than  one  (slopes  of  0.86  and  0.95).  Second,  the  SDO  result  is  more  similar  to 
the  SWPC  result  than  Holloman,  although  using  the  same  slope  analysis,  the  SDO 
reported  areas  are  generally  greater  than  SWPC  reported  total  area,  seen  from  the 
0.94  slope.  This  comparison  shows  that  the  SDO  code  produces  area  results  that 
are  in  line  with  those  of  both  SWPC  and  Holloman. 


Figure  10.  The  difference  between  the  daily  total  area  calculated  by  each  of  the  three 
reporting  entities  is  plotted  from  6  Jul  -  31  Dec  2012. 


4.2.2  Group  Number  Comparison. 

The  second  comparison  in  this  analysis  is  the  number  of  groups  reported  by 
each  entity.  Again,  this  comparison  is  conducted  on  a  day  by  day  basis,  summing 
the  number  of  groups  from  each  entity  over  the  179  day  test  period.  When 
comparing  groups  between  each  entity,  it  is  shown  that  while  the  grouping 
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Figure  11.  Regression  of  SWPC  and  Holloman  total  area  from  6  Jul  -  31  Dec  2012. 


algorithm  employed  in  the  SDO  code  is  different  from  past  methods  (Section  3.5.3), 
it  stays  close  to  the  groupings  provided  by  Holloman  and  SWPC.  It  is  important  to 
remember  that  while  all  reporting  entities  attempt  to  group  sunspots  in  the  same 
way,  there  may  be  differences  between  day  to  day  reports.  These  grouping 
differences  end  up  being  very  important  when  each  group  is  classified  into  a  certain 
McIntosh  category.  The  majority  of  misclassihed  groups  suffer  from  an  incorrect 
grouping  in  the  SDO  code,  so  it  is  important  to  minimize  error. 

There  are  three  elements  of  this  algorithm  that  will  contribute  to  differences  in 
the  sunspot  group  comparison.  First, due  to  the  additional  spatial  resolution  of  SDO 
images,  the  SDO  code  has  the  capability  to  detect  smaller  groups  that  may  go 
unnoticed  by  an  observer  at  an  optical  observatory.  This  additional  detection 
capability  will  tend  to  increase  the  number  of  sunspot  groups  seen  by  the  SDO  code 
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Figure  12.  Regression  of  SWPC  and  SDO  total  area  from  6  Jul  -  31  Dec  2012. 


compared  to  both  Holloman  and  SWPC.  The  second  item  to  consider  is  the  fact 
that  SWPC  reports  only  include  sunspot  groups  that  were  given  region  numbers 
prior  to  the  publishing  of  the  daily  report.  It  is  therefore  possible  for  Holloman  to 
see  a  group  before  any  other  observatory,  or  to  see  a  group  and  have  that  group  go 
unconfirmed  by  another  observatory.  This  would  result  in  the  omission  of  that 
particular  group  in  the  SWPC  report.  Moreover,  both  Holloman  and  SDO  may 
detect  and  report  this  spot  group,  but  it  may  be  missed  by  SWPC.  The  third 
possibility  is  that  there  is  a  difference  in  the  way  spots  were  grouped  between  any  of 
the  three  observatories.  While  Holloman  bases  its  groupings,  to  a  certain  extent,  on 
the  active  region  assignments  from  SWPC,  there  are  still  examples  where  the 
groupings  differ  between  the  two.  Generally,  this  will  be  after  an  active  region 
separation  call  is  made  by  SWPC,  leading  to  a  different  grouping  in  their  report. 


61 


Figure  13.  Regression  of  SDO  and  Holloman  total  area  from  6  Jul  -  31  Dec  2012. 

However,  the  opposite  process  can  also  occur  where  SWPC  does  not  follow  the 
grouping  made  by  Holloman.  This  determination  of  grouping  will  occur  external  to 
any  adjustments  in  the  SDO  code  (as  it  is  uninfluenced  by  outside  sources), 
resulting  in  a  miss-grouping. 

Comparing  the  different  number  of  groups  in  Figure  14,  we  see  that  the 
differences  are  minimal,  and  all  three  lines  oscillate  around  the  zero  difference  line. 
When  there  are  peaks  that  jump  up  to  a  larger  difference,  the  SDO  code  tends  to 
have  more  than  either  of  the  other  two  reporting  entities.  This  fact  is  also  made 
evident  in  the  subsequent  scatter  plots  that  illustrate  the  trend  of  one  particular 
reporting  entity  to  another  in  Figures  15-17.  Each  axis  shows  the  number  of  groups 
reported  on  a  particular  day  corresponding  to  the  report  issued  by  that  reporting 
entity.  Again,  perfect  agreement  results  in  a  point  lying  along  a  line  with  a  slope  of 
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one  through  the  origin.  Scatter  above  that  line  illustrates  an  over-determination  of 
the  number  of  spot  groups  for  that  particular  day  with  respect  to  the  other  entity 
while  the  opposite  is  true  for  scatter  under  the  line.  In  this  case,  the  Figure  15 
regression  shows  that  the  number  of  groups  reported  by  the  SDO  code  roughly 
matches  the  number  of  groups  from  Holloman  with  a  regression  slope  of  1.02 
accompanied  by  an  R  value  of  0.705,  indicating  that  any  significant  variation  in  the 
difference  between  Holloman  and  SDO  is  anomalous.  In  contrast,  Figure  16 
demonstrates  the  differences  between  SWPC  and  Holloman,  showing  that  there 
exists  a  trend  suggesting  Holloman  tends  to  report  more  groups  than  SWPC  with  a 
regression  slope  of  0.81  tilted  towards  the  Holloman  axis  with  an  R  value  of  0.703. 
Extending  this  to  the  third  plot  in  Figure  17,  SDO  tends  to  have  more  groups 
reported  than  SWPC  also,  with  a  smaller  regression  slope  of  1.07  tilted  towards  the 
SDO  axis  and  the  highest  R  squared  value  of  all  three  plots,  0.722.  This  is 
important  to  mention  because  it  helps  show  the  interconnected  nature  of  each 
comparison  and  illustrates  the  multiple  factors  that  can  go  into  determining  a 
match.  While  there  are  clearly  differences  in  the  number  of  groups  between  each 
reporting  entities,  the  comparison  method  can  proceed  under  the  knowledge  that 
there  is  fairly  good  agreement  across  all  observers.  Some  of  these  groups  may  be 
constructed  with  different  sunspots  causing  a  difference  in  classification,  but  it  has 
been  shown  that  the  number  of  groups  and  the  total  area  of  sunspots  reported  by 
the  SDO  code  is  not  anomalous. 

Longitude  and  longitude  matching  error  is  closely  linked  with  the  difference  in 
the  number  of  reported  sunspot  groups.  When  comparing  Holloman  to  SWPC, 
differences  indicate  either  that  Holloman  missed  a  spot  group  on  the  drawing  board 
or  that  a  group  Holloman  reported  has  not  been  confirmed  by  another  solar 
observatory.  Generally  speaking,  the  same  spots  are  detected  by  all  entities,  but 
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Figure  14.  The  difference  between  the  daily  total  number  of  groups  determined  by 
each  of  the  three  reporting  entities  is  plotted  from  6  Jul  -  31  Dec  2012. 


Figure  15.  Regression  of  SDO  and  Holloman  total  number  of  groups  from  6  Jul  -  31 
Dec  2012. 


separated  into  groups  in  different  manners  related  to  the  observatory,  the  particular 
solar  analyst,  or  some  additional  analysis  performed  on  the  particular  region. 


64 


Figure  16.  Regression  of  SWPC  and  Holloman  total  number  of  groups  from  6  Jul  -  31 
Dec  2012. 


Sometimes,  SWPC  will  choose  to  separate  out  a  group  into  two  distinct  active 
regions,  even  though  the  two  groups  are  close  enough  to  be  considered  part  of  the 
same  group  under  the  McIntosh  definition.  This  is  not  something  that  can  be 
influenced  in  the  automated  SDO  code  at  the  moment,  so  these  particular  days  will 
always  result  in  the  SDO  “missing”  a  group,  meaning  that  the  SDO  report  did  not 
have  a  group  available  to  match  the  location  of  the  other  reporting  entities.  Even 
so,  all  the  spots  comprising  the  two  groups  reported  by  SWPC  and  Holloman  are 
still  contained  in  the  one  group  defined  by  the  SDO  code.  Over  the  course  of  the  6 
months  of  data  tested,  only  one  spot  group  was  missed  by  the  SDO  code,  and  the 
classification  corresponding  to  this  miss  showed  that  the  missed  group  was  in  the 
formative  stage  of  its  life.  In  addition,  the  spot  group  was  very  far  out  towards  the 
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Figure  17.  Regression  of  SWPC  and  SDO  total  number  of  groups  from  6  Jul  -  31  Dec 
2012. 


limb  of  the  sun,  a  region  where  the  SDO  code  does  not  perform  as  well  due  to  the 
inaccuracy  of  the  limb  darkening  correction  in  that  region,  discussed  in  Section  2.3. 


4.2.3  Number  of  Spots  Comparison. 


The  final  element  to  compare  before  moving  on  to  the  discussion  of 
classification  accuracy  is  the  number  of  spots  reported  by  SDO  compared  to  SWPC 
and  Holloman.  In  addition  to  the  difference  in  resolution,  there  are  some  caveats 
that  must  be  mentioned  before  a  comparison  of  the  number  of  spots  from  each 
entity  can  be  analyzed. 

The  method  by  which  spots  are  counted  is  the  same  for  each  reporting  entity. 
Specifically,  only  umbras  are  counted  as  spots  in  each  case.  In  the  situation  where 
one  large  group  enclosed  by  penumbra  contains  multiple  umbras,  the  umbras  all 
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contribute  to  the  number  of  spots.  Additionally,  if  there  is  a  small  spot  with  no 
mature  penumbra  surrounding  it,  that  spot  is  still  counted  under  this  system. 
Applying  the  system  to  the  SDO  code,  the  process  by  which  sunspots  are  detected 
should  be  quickly  reiterated,  as  outlined  in  Section  3.4.  The  threshold  is  determined 
iteratively  by  decreasing  step  size,  resulting  in  a  black  and  white  image  representing 
the  penumbra  and  umbra  visible  on  the  solar  disk  in  white  pixels.  Using  this 
method,  the  possibility  of  detecting  a  sunspot  that  is  comprised  of  only  penumbra 
results  in  a  complicated  determination  of  spot  number. 

Understanding  that  each  of  these  penumbra  regions  may  also  contain  umbra 
(due  to  the  fact  that  if  a  pixel  is  darker,  it  will  likely  be  surrounded  by  less  dark 
pixels),  the  black  and  white  labeled  image  of  the  penumbra,  set  equal  to  the 
particular  sunspot  of  interest,  is  multiplied  back  into  the  black  and  white  image  of 
the  umbra.  The  result  of  this  matrix  multiplication  will  contain  the  location  of  all 
the  umbras  within  the  particular  penumbra  while  all  other  locations  are  set  to  zero. 
Those  umbra  are  then  counted  using  the  black  and  white  labeling  technique  and, 
after  subtracting  one  to  avoid  double  counting,  these  extra  spots  are  added  to  the 
total  number  of  penumbra  spots  from  the  previous  count.  Because  of  the  intensity 
thresholding  method  used  to  determine  the  penumbra  and  umbra,  there  is  a 
possibility  spots  may  be  separated  from  one  another  that  perhaps  wouldn’t 
separated  to  an  optical  observer,  ft  is  with  this  knowledge  that  the  analysis 
progresses  under  the  assumption  that  the  determination  of  sunspot  numbers 
through  the  SDO  code  is  valid. 

The  evident  trend  in  Figure  18  is  that  there  is  clear  separation  between  the 
SWP C-Holloman  and  the  SWPC-SDO  lines.  Both  the  SWPC-SDO  and  the 
Holloman-SDO  lines  sit  below  the  zero  line,  indicating  that  the  SDO  code  usually 
produces  a  report  seeing  more  sunspots  than  any  of  the  other  two  entities.  For 
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example,  the  solid  green  line  represents  the  number  of  spots  seen  by  the  SDO  code 
minus  the  number  reported  by  Holloman.  This  separation  is  further  confirmed  in 
the  subsequent  scatter  plots  displayed  in  Figures  19-21  where  each  entity  is 
compared  to  the  others  in  terms  of  sunspots  counted  for  any  particular  day.  Figure 
19  fully  illustrates  the  skew  with  a  slope  of  0.41,  significantly  less  than  the  one  to 
one  relationship  seen  in  either  of  the  area  or  group  number  comparisons.  With 
Figure  20,  the  trend  continues.  The  slope  for  this  figure  reduces  to  a  value  of  0.46, 
indicating  that  the  SDO  code  sees  more  spots  than  SWPC  as  well,  concurrent  with 
what  was  expected.  Figure  21  shows  a  trend  similar  to  the  other  two  figures  in  that 
the  number  of  spots  reported  by  Holloman  and  SWPC  are  roughly  equal  with  a 
regression  slope  of  0.94,  slightly  skewed  towards  the  Holloman  axis.  The  R  squared 
values  all  show  relatively  good  fit  agreement.  This  falls  in  line  with  the  trend  seen 
before  where  Holloman  will  generally  have  more  groups  than  SWPC,  further 
confirming  the  similarity. 


Figure  18.  The  difference  between  the  daily  total  number  of  sunspots  detected  by  each 
of  the  three  reporting  entities  is  plotted  from  6  Jul  -  31  Dec  2012. 


Figure  19.  Regression  of  SDO  and  Holloman  total  number  of  sunspots  from  6  Jul  -  31 
Dec  2012. 


4.3  Method  of  Classification  Comparison 


After  looking  at  the  differences  between  the  SDO  classification  result, 
Holloman  publications  and  SWPC  reports  for  total  reported  area,  number  of 
groups,  and  number  of  sunspots  seen,  the  next  step  is  to  demonstrate  the  accuracy 
of  the  automated  classification  method  compared  with  the  other  two  reporting 
entities.  In  addition,  the  differences  between  each  product  will  be  quantified  and 
any  cause  of  variation  between  each  entity  will  be  isolated.  One  of  the  main 
difficulties  surrounding  the  task  of  illustrating  the  accuracy  of  the  automated 
classification  method  is  the  fact  that  there  are  so  many  specific  quantities  that  go 
into  determining  a  classification.  Numerous  variables  make  it  difficult  to  see  the 
source  of  variation  and  they  can  cause  seemingly  large  differences  in  classification. 
Even  in  specific  tests  for  comparison,  it  is  possible  that  any  differences  seen  are  due 
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Figure  20.  Regression  of  SWPC  and  SDO  total  number  of  sunspots  from  6  Jul  -  31 
Dec  2012. 


to  multiple  factors  altering  the  classification  at  the  same  time.  These  factors  include 
elements  like  measuring  different  lengths  of  groups  and  the  lengths  of  spots  within  a 
group.  After  the  initial  accuracy  of  the  SDO  classification  method  is  established, 
many  additional  tests  are  performed  while  varying  a  single  element  of  the 
classification  process  to  demonstrate  the  effect  that  particular  item  holds  on  the 
accuracy  result. 

The  first  step  to  determining  the  validity  of  the  SDO  classification  method  is 
to  ensure  that  each  spot  group  seen  by  the  SDO  was  also  seen  by  the  other  two 
reporting  entities  and  vice  versa.  To  do  this,  data  from  the  two  sources  being 
compared  was  taken  on  a  day  by  day  basis  and  matched  based  on  the  longitude  and 
latitude  of  the  group.  All  reporting  entities  are  matched  using  the  same  method  to 
ensure  that  any  error  applied  through  the  method  would  be  uniform  for  every 
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Figure  21.  Regression  of  SWPC  and  Holloman  total  number  of  sunspots  from  6  Jul  - 
31  Dec  2012. 


comparison.  It  should  be  noted  that  matching  groups  in  this  way  results  in  the 
percent  match  between  SWPC  and  Holloman  being  lower  than  if  spot  groups  were 
compared  using  region  numbers. 

Displaying  the  position,  area  and  classification  of  each  spot  group  compared, 
Figure  22  demonstrates  a  typical  daily  match  between  Holloman  and  the  SDO  code. 
For  this  particular  section  and  day,  there  are  equal  numbers  of  detected  sunspot 
groups  from  both  the  Holloman  observatory  and  the  SDO  code.  Similar  outputs  are 
attainable  when  comparing  SDO  to  SWPC  or  SWPC  to  Holloman.  This  match  in 
number  of  groups  is  not  always  consistent  for  any  of  the  comparisons,  as  was  seen  in 
Figure  18,  but  when  there  are  missing  groups  in  any  of  the  data  sets,  it  is  almost 
exclusively  because  of  a  difference  in  grouping  between  the  SDO  and  the  other  two 
entities. 
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Figure  22.  Code  output  illustrating  the  matching  process  for  each  reporting  entity. 
Each  day  has  a  certain  number  of  sunspot  groups,  and  those  groups  will  be  matched 
based  on  their  proximity  to  groups  reported  by  the  other  reporting  entities.  From 
here,  the  classification  of  the  ’same’  group  will  be  compared  and  the  differences  in  the 
classification  method  of  each  entity  will  be  shown. 


To  quantify  the  accuracy  of  the  spot  finding  mechanism  of  the  SDO  code,  the 
comparison  output  is  used  to  find  the  number  of  missed  spot  groups  from  each 
reporting  entity  when  compared  to  the  other  reporting  entities.  Iterating  through 
each  spot  group  that  has  been  reported,  the  current  group  is  compared  to  other 
spot  groups  that  were  reported  on  the  same  day  by  another  reporting  entity. 
Matches  are  determined  by  locating  the  group  with  positions  that  best  match  the 
candidate  spot  group.  Ideally,  groups  reported  by  Holloman  and  the  SDO  will  be  in 
the  same  location  on  the  solar  disk  because  the  observation  times  roughly  match. 
For  comparisons  to  SWPC,  longitude  coordinates  are  projected  backwards 
uniformly  by  8  hours  to  the  approximate  time  when  the  SDO  image  and  Holloman 
spot  drawings  were  taken.  If  the  closest  spot  group  being  compared  is  within  10 
hcliographic  degrees  of  the  test  spot  group,  it  is  deemed  to  be  the  same  spot  group. 
A  range  of  10  heliographic  degrees  is  chosen  because  the  longitude/latitude  layovers 
used  by  Holloman  to  determine  the  position  of  the  center  of  a  spot  group  on 
sunspot  drawings  have  grid  marks  hatched  every  10  degrees.  This  method  therefore 
assumes  that  any  Holloman  observer  will  not  be  off  on  the  actual  position  of  the 
spot  group  and  the  reported  position  by  more  than  one  increment  on  the  longitude 
and  latitude  layover. 
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While  this  safety  net  does  allow  for  most  spots  to  be  correctly  related,  it  has 
two  downsides.  The  first  is  that  if  there  are  groups  that  are  closely  spaced  on  the 
solar  disk  that  do  not  belong  in  the  same  group,  there  is  a  chance  that  these  groups 
will  be  incorrectly  linked  with  another  proximate  group  that  should  not  match. 
Secondly,  the  less  common  occurrence  of  incorrect  reporting  of  sunspot  group 
position  can  cause  misses  when  the  two  groups  would  have  been  matched  in  the  case 
of  an  accurate  report.  These  mishaps  are  sometimes  corrected  when  mistakes  are 
discovered  by  SWPC  or  Holloman,  but  the  number  of  missed  mistakes  in  reported 
position  of  sunspots  is  not  negligible,  totaling  19  miscoded  entries  over  the  6  month 
test  period  surveyed  for  Holloman  AFB.  This  number  is  out  of  707  reported  groups 
in  this  time  period,  roughly  2.7%  of  the  total  reported  groups.  This  percentage  may 
be  considered  higher  if  the  post-publishing  corrected  mistakes  are  counted.  The 
error  percentage  is  low,  but  the  number  of  mistakes  is  equal  to  over  70%  of  the  miss 
percentage  (2. 7/3. 8)  between  the  SDO  code  and  Holloman.  Said  another  way,  the 
number  of  groups  that  the  SDO  missed  compared  to  Holloman  is  comparable  to  the 
number  of  miscoded  spot  groups  reported  by  Holloman  AFB.  Extending  this 
realization  to  SWPC  reports,  it  is  reasonable  to  assume  that  there  are  miscoded 
entries  in  that  data  set  as  well  (perhaps  to  a  lesser  extent).  Unfortunately,  this 
hypothesis  is  impossible  to  verify  in  the  same  manner;  Holloman  records  their  spot 
drawings  for  each  day  making  it  easy  to  go  back  and  visually  confirm  any  report. 
This  is  not  the  case  for  SWPC,  so  it  is  impossible  to  reduce  any  error  in  SWPC’s 
publication  and  the  possibility  of  an  incorrect  reporting  must  be  considered 
uncorrect  able. 

When  looking  at  any  particular  reporting  entity  in  Table  4,  it  can  be  seen  that 
the  match  percentage  of  groups  found  by  the  SDO  code  compared  to  groups  that 
either  Holloman  or  SWPC  found  is  always  greater.  Displayed  in  the  table  in  this 
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case  is  the  number  of  missed  groups  by  each  reporting  entity  when  compared  to 
other  entities,  and  in  all  comparison  tests,  the  SDO  misses  the  fewest  number  of 
groups  compared  to  the  other  entity.  It  should  be  noted  that  in  the  case  of 
comparing  SDO  to  both  Holloman  and  SWPC  (on  the  bottom  row  of  Table  4), 
there  are  significantly  more  missed  groups.  This  can  be  the  case  because  of  several 
different  reasons,  the  main  one  being  that  the  SDO  can  detect  smaller  spots  than 
either  Holloman  or  SWPC.  Each  spot  group  detected  is  required  to  have  more  than 
0.3  millionths  of  a  solar  hemisphere  in  area  in  order  to  be  considered  an  actual  spot 
group.  This  is  more  or  less  an  arbitrary  requirement,  but  it  was  selected  based  on 
the  achieved  match  percentage.  Because  of  this  limit,  additional  dark  spots  that  are 
detected  using  the  SDO  code  are  neglected  when  comparing  to  Holloman  or  SWPC. 
In  the  matching  process,  it  is  possible  to  increase  the  requirements  to  be  considered 
a  spot  group  so  that  larger  areas  must  be  achieved  to  be  kept  in  the  comparison 
cycle.  The  groups  that  would  be  cut  out  in  this  case  are  small  groups  comprised  of 
one  or  two  spots,  likely  spot  groups  in  the  formative  stage  of  evolution.  However, 
the  position  match  percentage  drops  to  lower  values  after  this  selective  processing. 
Moreover,  the  point  of  this  comparison  is  to  show  the  number  of  additional  spot 
groups  that  the  SDO  code  can  detect,  and  the  percent  increase  of  additional  spot 
groups  matched  to  either  Holloman  or  SWPC.  These  positive  results  demonstrating 
the  limited  number  of  missed  sunspot  groups  between  SDO  and  SWPC/Holloman 
illustrates  the  accuracy  of  this  sunspot  detection  method.  While  not  all  groups  are 
detected,  a  very  low  miss  percentage  when  comparing  SWPC,  Holloman,  and  the 
SDO  code  shows  the  success  of  this  detection  method. 


74 


Table  4.  Location  matching  between  each  reporting  entity  shown  in  raw  count  and 
percentages  over  the  July  2012  -  December  2012  time  span.  Spot  groups  were  limited 
by  requiring  that  SDO  spots  seen  must  have  an  area  of  more  than  0.3  millionths  of  a 
solar  hemisphere.  When  comparing  the  reporting  entity  in  the  left  column  with  any 
of  the  other  reporting  entities  listed  across  the  top,  the  numbers  listed  illustrate  the 
amount  of  spot  groups  the  top  entity  failed  to  match  compared  to  the  left  entity. 


Holloman 
Missed  Groups 

SWPC 

Missed  Groups 

SDO 

Missed  Groups 

Total 

Groups 

Holloman 

- 

61/707  (8.6%) 

25/659  (3.8%) 

707 

SWPC 

84/730  (11.5%) 

- 

44/686  (6.4%) 

752 

SDO 

223/857  (26.0%) 

214/857  (25.0%) 

- 

857 

4.4  Classification  Accuracy  Evaluation 


The  next  stage  of  analysis  involves  determining  the  accuracy  of  the 
classification  method  used  on  the  detected  sunspot  groups.  This  comparison  is  more 
difficult  simply  because  of  the  various  factors  that  can  affect  the  determination  of  as 
sunspot  group’s  class.  In  addition  to  the  numerous  deciding  factors  for  a  sunspot 
group’s  classification,  the  classification  process  is  largely  ambiguous  meaning  that 
the  difference  between  even  the  simplest  of  classifications  can  be  minuscule  and 
subjective.  For  example,  when  looking  at  the  first  letter  of  the  McIntosh 
classification,  classifications  for  groups  that  have  developed  penumbra  on  both  the 
leading  and  trailing  spots  in  the  group  are  determined  by  the  length  of  the  spot 
group.  As  an  example,  groups  that  are  less  than  15  hcliographic  degrees  in  length 
are  given  the  classification  ‘E’,  while  groups  of  length  greater  than  15  hcliographic 
degrees  are  designated  ‘F’  class  groups.  Because  there  is  a  hard  cutoff  for  this 
distinction,  the  difference  between  two  separate  classifications  could  simply  be  that 
one  observer  determines  the  group  to  be  over  15  degrees  in  length  while  the  other 
observer  determines  the  group  to  be  under  15  degrees  in  length.  This  difference  is 
not  major  in  terms  of  an  accurate  representation  of  the  length  of  the  spot  group, 
but  it  does  create  a  large  difference  in  the  classification  as  a  whole  and  would 
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therefore  be  flagged  as  an  incorrect  classification  in  a  comparison  step.  Moreover, 
the  first  letter  of  the  McIntosh  classification  is  not  the  only  letter  affected  by  this 
subjectivity.  All  three  letters  of  the  classification  process  are  subject  to  unspecific 
wording  and  ambiguous  thresholds  that  determine  a  spot  group  to  be  of  one 
classification  or  another.  While  separate  observers  can  interpret  the  data  differently, 
a  consistent  classification  process  using  the  SDO  code  is  of  principle  importance.  It 
is  with  this  goal  in  mind  that  the  parameters  determining  an  accurate  classification 
are  defined. 

In  order  to  establish  the  accuracy  of  a  classification  method,  three  separate 
standards  for  comparison  are  employed,  each  level  further  adapting  to  the 
subjectivity  of  the  McIntosh  classification.  A  60  x  60  matrix  is  created  where  each 
column  and  row  has  been  assigned  one  of  the  60  McIntosh  classifications.  Equal  row 
and  column  numbers  are  assigned  the  same  classification  so  that  the  diagonal 
“coordinates”  are  comprised  of  matching  McIntosh  classifications.  The  order  of  the 
60  classifications  across  the  rows  and  columns  is  the  same  for  both  columns  and  rows 
and  objectively  follows  the  best  representation  of  the  evolution  of  a  sunspot  group 
(based  on  length  and  maturity  of  the  penumbra).  The  order  given  to  the  Zurich 
classifications  is  as  follows:  A,  H,  B,  C,  D,  E,  F.  Within  each  of  these  classifications, 
the  subsequent  letters  area  also  ordered.  For  example,  there  are  two  different  types 
of  McIntosh  classifications  for  the  ‘B’  designation.  The  overall  order  of  the  two  ‘B’ 
classifications  in  the  comparison  matrix  depends  on  the  letters  following  ‘B’.  For  the 
penumbra  classification,  the  order  follows  the  string:  x,  r,  s,  h,  a,  k.  The  order  for 
the  compactness  classification  is:  x,  o,  i,  c.  Only  the  possible  McIntosh 
classifications  are  listed  along  each  row  and  column.  This  matrix  will  be  used  to 
show  classifications  that  are  proximate  to  others  as  any  matched  classification  will 
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be  proximate  to  the  diagonal.  Classifications  far  off  the  diagonal  are  less  accurate, 
in  general.  The  order  of  classifications  in  the  comparison  matrix  is  listed  in  Table  5. 


Table  5.  The  order  of  McIntosh  classifications  is  presented  adjacent  to  both  the  column 
and  row  numbers.  Each  classification  has  been  assigned  both  a  row  and  column  number 
in  the  comparison  matrix  according  to  the  best  representation  of  sunspot  group  growth. 


Col/Row 

Class 

Col /Row 

Class 

Col/ Row 

Class 

Col/Row 

Class 

1 

Axx 

16 

Chi 

31 

Dki 

46 

Eke 

2 

Hrx 

17 

Cko 

32 

Dkc 

47 

Fro 

3 

Hsx 

18 

Cki 

33 

Ero 

48 

Fri 

4 

Hax 

19 

Dro 

34 

Eri 

49 

Fso 

5 

Hhx 

20 

Dri 

35 

Eso 

50 

Fsi 

6 

Hkx 

21 

Dso 

36 

Esi 

51 

Fsc 

7 

Bxo 

22 

Dsi 

37 

Esc 

52 

Fao 

8 

Bxi 

23 

Dsc 

38 

Eao 

53 

Fai 

9 

Cro 

24 

Dao 

39 

Eai 

54 

Fac 

10 

Cri 

25 

Dai 

40 

Eac 

55 

Fho 

11 

Cso 

26 

Dac 

41 

Elio 

56 

Fhi 

12 

Csi 

27 

Dho 

42 

Ehi 

57 

Flic 

13 

Cao 

28 

Dhi 

43 

Ehc 

58 

Fko 

14 

Cai 

29 

Dhc 

44 

Eko 

59 

Fki 

15 

Cho 

30 

Dko 

45 

Eki 

60 

Fkc 

The  simplest  way  to  determine  the  accuracy  of  the  SDO  classification  method 
is  to  do  a  one  to  one  comparison  between  the  results  from  two  reporting  entities. 
This  means  that  when  a  group  is  matched  based  on  proximity  to  another  group 
reported  by  a  different  entity,  each  letter  of  the  classification  must  be  matched 
exactly.  If  any  part  of  the  two  classifications  do  not  match,  or  more  specifically  the 
match  corresponds  to  a  point  that  does  not  he  on  the  diagonal  of  the  comparison 
matrix,  the  issued  classification  is  not  adequate  under  this  metric.  This  type  of 
matching  method  will  be  referred  to  as  the  direct  method.  Remembering  that  there 
is  a  large  amount  of  subjectivity  in  the  McIntosh  classification  process,  the 
requirements  are  loosened  to  an  intermediate  category.  This  metric  includes  the 
entirety  of  the  first  letter  of  the  McIntosh  classification,  accepting  any  classification 
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that  matches  the  first  letter.  Finally,  for  a  broad  category  enclosing  most 
foreseeable  subjectivities  in  the  classification  process,  the  relaxed  metric  is  defined. 
This  metric  accepts  all  matches  that  fit  the  first  letter  of  the  McIntosh  classification. 
The  specific  regions  of  acceptable  classifications  should  be  outlined.  Including  all 
areas  accepted  in  the  previous  two  metrics,  this  final  category  accepts  neighboring 
Zurich  Classifications  with  broader  acceptance  bands  in  the  ’H’  category.  The  order 
(listed  in  Table  5)  was  chosen  to  most  accurately  represent  the  most  common 
evolution  of  a  spot  group  so  that  the  closest  and  most  similar  classifications  are 
neighbors.  Typically,  the  ‘H’  category  is  placed  after  ‘F’,  but  that  jump  in  terms  of 
size,  shape,  number  of  spots,  and  complexity  of  magnetic  field  is  substantial,  so 
instead  the  ‘H’  category  is  placed  more  towards  the  formative  stage  of  the  sunspot 
group’s  life.  For  this  metric  therefore,  matches  accept  classifications  of  the 
neighboring  letter,  for  example,  ‘C’  accepts  both  ‘B’  and  ‘D’.  For  the  ‘H’ 
classification,  the  ‘C’  group  is  also  accepted  because  of  the  potential  loss  of  small 
spots  surrounding  the  main  spot  with  developed  penumbra  in  the  ‘H’  classification. 
If  the  presence  of  smaller  spots  is  overlooked,  a  group’s  polarity  may  never  be  tested 
and  that  spot  group  may  be  classified  incorrectly  as  a  result.  This  is  accounted  for 
in  the  relaxed  metric  of  comparison.  All  three  metrics  are  viewable  on  the  matrix 
representation  in  Figure  23. 

Again,  a  classification  issued  from  Holloman  or  SWPC  does  not  necessarily 
represent  the  ground  truth.  Simply  coming  up  with  a  different  classification  than 
was  published  does  not  indicate  any  significant  error  in  a  classification  process, 
especially  when  the  issued  classification  from  either  of  those  two  reputable  sources 
may  have  been  different  if  an  alternate  solar  analyst  had  happened  to  be  on  duty 
that  day. 
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Figure  23.  Each  metric  for  classification  comparison  is  summarized  on  the  60x60  matrix. 
Every  row  and  column  number  corresponds  to  an  individual  McIntosh  classification 
listed  in  Table  5.  The  Direct  metric  is  the  most  stringent  while  the  Relaxed  method 
is  the  most  open.  Each  match  made  between  reporting  entities  will  end  up  somewhere 
on  the  matrix,  but  good  matches  occur  near  the  diagonal  line. 
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A  baseline  to  compare  the  classification  method  of  the  SDO  code  is  established 
through  a  comparison  of  SWPC  reports  to  Holloman  drawings.  While  still 
extremely  variable,  this  comparison  shows  the  difference  between  the  classification 
of  Holloman  and  the  reported  classifications  coming  from  SWPC.  Comparing 
Holloman  to  SWPC  is  not  necessarily  a  good  indicator  of  accuracy  as  one 
classification  is  dependent  on,  or  suffers  influence  from,  the  classification  of  the 
other.  Because  SWPC  publications  are  an  amalgamation  of  the  classification  results 
of  several  different  observatories,  differences  in  the  code  may  not  be  as  significant  as 
they  would  be  if  SWPC  was  making  their  own  observations.  Ideally,  the 
classification  process  from  the  SDO  should  show  a  difference  comparable  to  the 
difference  between  the  other  two  reporting  entities.  This  would  illustrate  the  SDO 
code  result  lies  within  the  same  amount  of  difference  from  the  baseline.  Therefore, 
results  that  are  proximate  to  the  difference  between  SWPC  and  Holloman  are 
considered  good  as  they  demonstrate  the  ability  of  the  SDO  code  to  replicate  the 
Holloman  classification  process  in  an  independent,  automated  manner. 

It  should  be  noted  that  this  comparison  metric  does  not  capture  the 
consistency  of  the  second  and  third  letter  of  the  McIntosh  code.  These  parts  of  the 
classification  tend  to  get  lumped  into  a  single  percentage,  so  an  additional 
assumption  of  this  metric  is  that  these  later  categories  are  less  important  to  the 
overall  classification  scheme,  at  least  at  this  stage  of  code  development.  Additional 
tests  can  be  run  to  show  specific  accuracy  for  these  areas,  but  these  tests  have  not 
yet  been  attempted. 

4.4.1  SWPC  to  Holloman. 

The  baseline  for  classification  comparisons  is  set  between  Holloman  and 
SWPC.  The  numbers  from  this  section  will  next  be  compared  to  the  SDO  to  SWPC 
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numbers  to  establish  the  proximity  of  the  SDO  method  to  Holloman  and  SWPC 
when  comparing  to  the  Holloman/SWPC  standard. 


Table  6.  Summary  of  Holloman  and  SWPC  code  match  percentages  for  the  three 
different  match  metrics. 


Metric 

Match  % 

Direct 

33.73% 

Intermediate 

59.28% 

Relaxed 

87.22% 

The  relaxed  metric  contains  the  vast  majority  of  all  the  matched  sunspot 
groups.  Outliers  predominantly  he  on  the  upper  side  of  matching  matrix,  indicating 
that  SWPC  classifications  tend  to  have  a  higher  classification  compared  to 
Holloman.  For  example,  when  Holloman  classifies  a  group  as  ‘C’  and  SWPC’s 
classification  doesn’t  match,  SWPC  will  in  general  give  that  group  a  classification  of 
‘D’  or  higher.  The  intermediate  metric  also  holds  a  substantial  portion  of  match 
points.  Many  of  the  sunspots  matched  fit  within  the  first  letter  of  the  McIntosh 
classification,  and  this  proves  to  be  the  highest  direct  match  of  any  comparison. 

4.4.2  SWPC  to  SDO. 

Next,  the  same  type  of  comparison  is  applied  to  the  SWPC  and  SDO  match. 
This  will  enable  a  comparison  between  the  SDO  code  prodct  and  the  baseline 
SWPC/Holloman  match.  For  the  test  data  set  analyzed  here,  the  SDO  classification 
method  was  programmed  to  obtain  results  as  close  as  possible  to  the  McIntosh 
classification  scheme.  This  means  that  all  values  for  determining  classification  were 
taken  directly  from  [ McIntosh ,  1990].  No  attempt  was  made  to  alter  parameters  to 
best  match  any  of  the  other  reporting  entities. 

In  this  SDO  to  SWPC  case,  the  percent  match  is  dissimilar  to  the  SWPC  and 
Holloman  match  in  almost  all  categories.  Specifically,  the  direct  match  is 
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Table  7.  Summary  of  SWPC  and  SDO  code  match  percentages  for  the  three  different 
match  metrics  for  the  6  July  -  31  December  2012  data  set. 


Metric 

Match  % 

Direct 

13.46% 

Intermediate 

49.26% 

Relaxed 

83.74% 

significantly  lower  than  the  baseline  match  set  in  Section  4.4.1,  but  this  fact  is  not 
surprising  considering  the  wide  variation  displayed  by  the  classification  process. 
Paired  with  the  fact  that  SWPC  and  Holloman  classifications  are  not  unbiased 
towards  each  other,  it  is  reasonable  to  see  that  the  direct  match  percentage  between 
SWPC  and  Holloman  should  be  higher  than  the  direct  match  percentage  between 
SWPC  and  SDO.  Following  the  same  line  of  reasoning,  the  intermediate  match 
percentage  will  be  lower  as  well.  As  it  turns  out,  the  difference  has  diminished  to 
roughly  10%,  cutting  the  direct  match  percentage  difference  by  a  factor  of  two. 
When  looking  at  the  relaxed  metric,  it  can  be  seen  that  the  match  between  SWPC 
and  SDO  is  significantly  closer  to  the  Relaxed  metric  from  SWPC  to  Holloman, 
indicating  a  good  method  for  classification,  within  4%  error.  A  major  point  to 
emphasize  is  the  fact  that  SDO  classification  code  is  uninfluenced  by  any  bias  from 
outside  sources,  and  this  match  percentage  is  achieved  without  any  user  input  or 
influence  of  past  classifications.  Additionally,  the  classification  process  is  not  biased 
by  outside  observations  made  prior  to  the  development  of  the  code.  This  establishes 
that  the  SDO  classification  method  is  an  acceptable  tool  capable  of  being  effectively 
compared  to  other  classification  methods. 
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4.4.3  SDO  to  Holloman. 


Next,  the  method  is  applied  to  matches  with  Holloman  AFB.  The  resulting 
match  percentages  are  much  better  than  those  obtained  when  comparing  SDO  to 
SWPC. 


Table  8.  Summary  of  Holloman  and  SDO  code  match  percentages  for  the  three  different 
match  metrics. 


Metric 

Match  % 

Direct 

20.42% 

Intermediate 

53.05% 

Relaxed 

86.04% 

With  improvement  in  each  of  the  comparison  metrics  in  contrast  with  the 
SDO  to  SWPC  comparison,  the  SDO  to  Holloman  match  is  closer  to  the  baseline 
from  Section  4.4.1.  The  direct  metric  is  the  most  improved,  increasing  the  match 
percentage  for  this  category  by  a  little  more  than  7%  from  the  direct  comparison  in 
Section  4.4.2.  However,  the  direct  metric  match  percentage  in  this  comparison  is 
still  lower  than  the  Holloman  to  SWPC  match,  as  expected.  The  intermediate  result 
improved  a  little  less  than  4%,  while  the  relaxed  metric  sits  deficient  from  the 
SWPC  to  Holloman  match  by  a  single  percent.  This  comparison  demonstrates 
further  the  usefulness  and  accuracy  of  the  SDO  method  compared  to  both  of  the 
other  comparison  methods  used  in  this  study. 

4.5  Classification  Consistency 

To  show  the  consistency  of  this  classification  method,  an  additional  6  months 
of  data  was  tested  under  the  same  conditions  and  the  resulting  percentages  were 
similar,  shown  in  Table  9.  This  additional  6  months  of  data  represents  a  blind  test 
executed  on  a  data  set  that  was  not  used  to  optimize  match  percentages.  There  are 
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Table  9.  Summary  of  all  three  metrics  for  judging  accuracy  of  classification  applied  to 
the  three  comparison  scenarios.  While  the  direct  and  intermediate  match  percentage 
is  clearly  significantly  lower  in  the  SWPC  to  SDO  match,  the  relaxed  metric  shows 
that  results  are  still  within  the  general  purview  of  the  SWPC  to  Holloman  match.  The 
Testing  period  data  set  spans  from  6  July  2012  through  31  December  2012  while  the 
Evaluation  data  set  runs  from  1  January  2013  to  29  June  2013. 


SWPC  to 
Holloman 

SWPC  to 
SDO 

SDO  to 
Holloman 

Direct 

33.73  % 

13.46  % 

20.42  % 

Testing 

Intermediate 

59.28  % 

49.26  % 

53.05  % 

Relaxed 

87.22  % 

83.74  % 

86.04  % 

Direct 

33.38  % 

20.22  % 

24.54  % 

Evaluation 

Intermediate 

57.48  % 

51.25  % 

50.91  % 

Relaxed 

87.67  % 

83.80  % 

80.65  % 

improvements  that  can  be  made  to  this  data  set  similar  to  those  outlined  in  Section 
4.3  that  were  not  applied  to  this  comparison,  indicating  that  the  numbers  comparing 
SWPC  to  SDO  and  SDO  to  Holloman  could  be  better  than  the  ones  shown,  but  the 
results  are  similar  enough  to  show  consistency  between  the  two  different  testing 
periods.  Consistency  between  both  data  sets  shows  that  the  results  obtained  are 
statistically  significant  and  will  likely  not  change  for  additional  testing  periods. 

In  addition  to  establishing  the  accuracy  of  the  SDO  method,  this  comparison 
also  sheds  light  on  an  additional  use  for  the  SDO  code.  Due  to  the  consistency  of 
this  method,  paired  with  the  fact  that  this  comparison  method  differs  somewhat 
from  the  Holloman  method,  the  automated  SDO  code  can  be  used  to  evaluate  the 
bias  contained  in  the  Holloman  drawing  and  classification  method.  In  theory,  there 
is  some  amount  of  bias  contained  in  both  methods.  However,  the  SDO  code  uses  the 
same  images  to  produce  the  same  classification  results  every  run,  and  the  code  can 
be  altered  to,  in  turn,  alter  the  resulting  data.  Therefore,  systematically  changing 
elements  in  the  SDO  code  to  produce  classification  results  that  better  match  the 
results  obtained  by  Holloman  will  quantify  the  inherent  bias  for  particular 
classifications.  This  research  does  not  endeavor  to  quantify  the  bias  of  the  other  Air 
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Force  optical  observatories,  but  rather  hopes  to  show  the  method  of  comparison  for 
future  pursuit  of  this  purpose. 

4.6  Holloman  AFB  Classification  Bias  Evaluation 

The  classification  process  used  by  the  SDO  code  is  based  on  the  same  set  of 
defined  rules  for  classification  used  by  Holloman  AFB,  and  an  acceptable  match 
percentage  has  been  achieved.  If  parameters  of  the  SDO  code  in  the  classification 
section  are  altered  (spot  eccentricity,  spot  length  and  group  length),  the  match 
percentages  change.  At  some  value  (different  for  each  of  these  parameters),  the  error 
between  matched  group  classifications  will  reach  a  minimum.  If  parameters  are 
altered  one  at  a  time,  this  minimum  point  will  indicate  the  exact  bias  for  that 
parameter  with  respect  to  the  SDO  code.  For  example,  there  is  a  cutoff  between  ‘s’ 
and  ‘a’  type  sunspot  groups  at  2.5  degrees.  Both  the  SDO  code  and  Holloman  are 
making  classifications  according  to  this  metric.  If  the  SDO  code  is  altered  to  2.8 
degrees,  however,  and  the  match  percentage  increases,  this  would  indicate  that  the 
cutoff  used  by  Holloman  may  actually  be  closer  to  2.8  degrees.  Parameters  selected 
to  be  evaluated  are  calculated  to  a  much  higher  precision  in  the  SDO  code  than 
their  counterpart  parameters  on  the  Holloman  drawing  board  (see  Appendix  1). 

The  spot  symmetry  parameter  is  subjective,  but  the  parameters  involving  spot 
length  are  not.  Match  percentages  improve  when  these  parameters  are  altered  to 
optimal  levels,  quantizing  the  general  tendency  for  Holloman  solar  analysts  to  label 
a  spot  group  one  way  over  another. 

This  method  does  not  necessarily  illustrate  any  error  in  the  classification 
process  used  by  Holloman  because  both  methods  (SDO  and  Holloman)  are  based  off 
the  McIntosh  method.  Rather,  this  comparison  process  further  illustrates  the 
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differences  between  the  two  methods  and,  most  importantly,  gives  a  metric  by  which 
to  measure  the  Holloman  observer  bias. 

To  determine  the  correctness  of  the  fit,  a  new  system  of  analysis  based  on  the 
60  x  60  matrix  from  Section  4.4  is  used.  To  judge  how  well  the  classification  fits  the 
spot  groups  that  have  been  matched  with  Holloman  AFB,  a  system  is  created  to 
quantify  how  the  average  classification  compares  to  the  actual  classification  given. 
This  comparison  can  be  done  with  respect  to  both  entities,  meaning  that  both  rows 
and  columns  can  be  compared  to  one  another  on  the  matrix.  For  the  explanation  of 
the  method,  rows  will  be  used,  but  the  same  process  can  be  applied  to  columns  as 
well.  To  quantify  the  variation,  a  single  row  is  isolated  in  the  60  x  60  matrix,  taking 
note  of  the  row’s  number  (1  -  60).  Next  the  biased  weighted  sample  variance  is 
determined  to  show  how  each  parameter  change  affects  the  total  accuracy. 


a 


2 

row 


60 


i=  1 


60 


(16) 


A 

i=  1 

where  ay  represents  the  current  column  and  wy  is  the  number  of  spots  in  that 
element.  An  example  calculation  is  shown  in  Figure  24.  The  value  of  arow  is  then 
solved  and  added  in  quadrature  to  the  corresponding  acoiumn  value  to  get  a  total  a. 
This  parameter  is  then  used  to  confirm  a  better  fit  for  matched  sunspot 
classifications.  In  addition  to  a  crtotai  value,  the  summation  can  be  limited  to  break 
out  each  of  the  various  Zurich  classifications  into  separate  categories  to  find  more 
specific  weighted  sample  variances.  In  this  way,  it  can  be  seen  how  altering  each 
parameter  affects  the  individual  Zurich  classifications  alongside  the  total  variation 


of  the  matrix. 


Axx  Hrx  Hsx  Hax  Hhx  Hkx  Bxo  Bxi  Cro  Cri  Cso  Csi  Cao  Cai  Cho  Chi  Cko  Cki  Dro 


Axx  Hrx  Hsx  Hax  Hhx  Hkx  Bxo  Bxi  Cro  Cri  Cso  Csi  Cao  Cai  Cho  Chi  Cko  Cki  Dro 


Figure  24.  The  classification  Hhx  was  selected  to  highlight  the  calculation  of  the  biased 
weighted  sample  variance  for  a  generic  code  comparison  so  all  other  values  outside  the 
single  row  are  set  to  zero.  A  classification  of  Hhx  is  5  units  from  the  origin  and  therefore 
represents  the  bias.  Summing  all  the  squared  weighted  distances  and  dividing  by  the 
total  weight,  the  average  location  for  the  Hhx  comparison  is  attained.  Subtracting  the 
bias  yields  the  final  variation  from  the  center  point.  This  comparison  is  done  in  the 
vertical  direction  as  well  and  added  in  quadrature  to  form  the  variation  metric. 


4.6.1  Modified  Zurich  Classification  Optimization. 

The  first  bias  evaluated  was  the  determination  of  the  first  letter  of  the 
McIntosh  classification,  the  modified  Zurich  classification.  This  letter  is  determined 
off  the  total  length  of  the  group  for  classifications  of  lD’  and  higher.  To  calculate 
the  classification,  it  must  first  established  whether  or  not  there  is  complete 
penumbra  surrounding  more  than  one  of  the  spots  in  the  group.  This  condition 
determines  whether  or  not  a  spot  can  be  considered  eligible  for  a  classification  above 
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a  ‘C’.  Proceeding  under  the  assumption  that  this  classification  method  is  sufficient 
with  respect  to  both  SWPC  and  Holloman  reports,  the  classification  method  is  also 
tested  while  varying  the  lengths  that  determine  the  first  letter.  To  initiate  the 
variation,  the  length  requirements  that  define  classes  ‘D’,  ‘E’,  and  ‘F’  were  all 
extended  by  0.5°  increments.  An  optimal  length  combination  will  provide  the  best 
match  between  Holloman  and  the  SDO  code. 

From  Table  10,  it  can  be  seen  that  there  are  particular  classifications  for  which 
the  biased  weighted  sample  variance  of  the  match  between  the  SDO  code  and 
Holloman  classification  is  minimized.  Not  all  minima  for  each  subdivision  lines  up 
with  the  total  minimum.  In  general,  the  test  that  included  a  bias  of  17.5°  for  the  F 
class,  12.5°  bias  for  the  E  class,  and  7.5°  bias  for  the  D  class  had  a  lower  variance 
than  the  other  tests.  In  the  break  out  the  variance  calculation  visible  in  the 
subsequent  columns,  it  can  be  seen  that  the  subsection  variance  becomes  minimum 
for  other  tests,  potentially  providing  a  more  specific  bias.  However,  this  evaluation 
simply  focuses  on  the  value  of  cr total  •  The  main  take  away  from  this  result  is  that  by 
altering  the  length  at  which  each  of  these  classifications  is  used,  the  match 
percentage  between  the  SDO  code  and  Holloman  can  be  increased,  indicating  that 
those  new  length  requirements  are  more  appropriate  to  describe  the  classification 
method  used  for  the  first  letter  by  Holloman. 

It  should  be  noted  that  the  optimization  of  the  modified  Zurich  classification 
was  established  using  an  older  version  of  code  that  differed  in  how  sunspot  numbers 
were  counted.  If  each  code  run  was  recreated,  it  is  likely  that  the  bias  would  be 
similar  as  the  differences  between  the  code  is  minimal. 
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Variance 

4.21 
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3.74 
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3.19 
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3.11 

2.89 

2.80 
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Variance 

2.12 

2.12 

3.01 

2.12 

2.12 

2.05 

2.05 

2.12 

2.05 

2.05 

2.05 

2.05 

2.05 

Total  Weighted 
Variance 

11.58 

11.48 

11.23 

11.35 

11.10 

11.04 

11.00 

10.87 

11.21 

12.00 

12.50 

12.61 

12.94 
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4.6.2  Eccentricity  Classification  Optimization. 


The  second  bias  established  was  with  respect  to  the  penumbra  classification’s 
symmetry,  the  second  letter.  The  value  used  in  the  development  of  the  McIntosh 
classification  was  the  eccentricity  of  the  largest  spot.  This  parameter  was  used  to 
establish  whether  or  not  a  spot  was  symmetric  or  asymmetric.  It  was  determined 
that  on  a  scale  where  a  perfect  circle  has  an  eccentricity  of  zero  and  the  most 
eccentric  closed  object  approaches  the  limit  at  unity,  that  the  eccentricity  cutoff 
should  be  0.5.  Because  there  is  no  numerically  defined  cutoff  for  a  sunspot  who  is 
asymmetric  vs.  symmetric,  this  particular  section  of  the  classification  was  highly 
unspecific.  The  value  of  eccentricity  used  to  classify  each  spot  group  was  therefore 
altered  between  0.1  and  0.9  to  find  the  minimum  variation  point.  The  variation 
became  minimum  at  an  eccentricity  value  of  0.8,  shown  in  Table  11.  This  indicates 
that  in  general,  Holloman  observers  require  a  spot  to  achieve  an  eccentricity  of  0.8 
or  higher  before  labeling  that  sunspot  group  to  be  asymmetric.  Interestingly,  the 
variance  dips  on  the  other  side  of  0.5  as  well,  but  to  a  lesser  degree.  This  indicates 
that  the  worst  value  to  choose  for  eccentricity  happened  to  be  0.5  for  the 
comparison  to  Holloman. 

4.6.3  Largest  Penumbra  Length  Optimization. 

The  final  bias  established  was  with  respect  to  the  largest  spot  length  in  the 
penumbra  classification,  also  in  the  second  letter.  Typically,  the  cutoff  between 
large  and  small  penumbra  is  2.5  heliographic  degrees,  but  the  difficulties 
surrounding  this  measurement  have  already  been  discussed  at  length.  Given  that 
the  smallest  increment  on  the  layover  is  in  units  of  10  degrees,  it  comes  as  no 
surprise  that  there  would  be  variance  with  respect  to  this  classification  (the  cutoff  is 
j  the  smallest  measurable  distance).  In  this  case,  the  test  was  started  at  a  value  of 
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Eccentricity 
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0.2 
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0.9 

F 

Variance 
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17.25 

E 

Variance 
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13.68 

15.13 
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14.24 

13.88 
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D 

Variance 
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9.89 

9.99 

10.03 

9.39 

9.28 

9.24 
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8.45 
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00 
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Variance 
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7.61 

7.61 

7.61 

7.44 

7.36 

Total  Weighted 
Variance 

12.48 

12.47 

12.48 

12.77 

14.09 

13.21 

12.64 

12.44 

12.48 

91 


2.0°  and  the  spot  length  cutoff  was  increased  by  increments  of  0.1°  to  find  a 
minimum  value.  After  reaching  a  length  of  3.0°,  the  trend  of  minimizing  variation 
was  seen,  so  an  additional  1°  was  tested.  At  the  end  of  these  runs,  the  variation 
continues  to  decrease  (Table  12),  so  it  can  be  concluded  that  the  bias  is  greater  than 
3.9°.  This  is  a  significant  variation  from  the  actual  cutoff,  but  was  a  likely  result  of 
the  interpolation  performed  during  the  classification  step  using  the 
latitude /longitude  layover. 
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Table  12.  Biased  weighted  sample  variation  for  the  second  letter  of  the  McIntosh  classification  with  respect  to  penumbra  length 
of  the  largest  spot  over  the  6  July  -  31  December  data  set.  The  variance  of  this  letter  is  broken  out  into  Zurich  classifications 
the  same  as  the  previous  two  tables.  The  final  column  displays  the  value  of  spot  length  used  in  the  classification  process  for 
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V.  Conclusions 


A  new  fully  automatic  sunspot  detection  and  McIntosh  classification  method 
for  SDO  HMI  imagery  implemented  in  MATLAB  has  been  presented.  Processing 
both  HMII  and  HMIM  full  disk  images,  results  were  obtained  that  align  with  the 
difference  between  an  established  standard  comparison  defined  by  the  juxtaposition 
of  published  reports  from  two  standard  sources  of  sunspot  data:  Holloman  AFB  and 
the  Space  Weather  Prediction  Center. 

5.1  Summary  of  Results 

A  fully  automated  image  processing  program  was  created  to  extract 
information  about  detected  sunspots  on  the  SDO  HMI  imagery.  This  program  uses 
a  single  time  step  to  calculate  the  position  of  sunspot  groups  detected  through  an 
iterative  global  thresholding  method.  In  conjunction  with  magnetic  field  data, 
sunspots  are  grouped  into  their  proper  regions  using  an  iterative  region  assignment 
process.  Important  sunspot  parameters  are  extracted  and  processed  through  a 
linear  logical  sequence  designed  to  mimic  the  classification  process  outlined  by 
[McIntosh,  1990]. 

Consistent  results  are  shown  for  the  detection  of  all  visible  sunspots,  as  well  as 
calculation  of  sunspot  area,  group  number  and  spot  number.  Missing  spot 
percentages  comparing  Holloman  to  SWPC  numbered  at  11.5%  whereas  the  reverse 
comparison  gave  a  miss  percentage  of  8.5%.  Comparisons  to  the  SDO  code  result 
yielded  a  miss  percentage  of  6.4%  with  respect  to  SWPC  and  3.8%  with  respect  to 
Holloman  indicating  a  lower  miss  percentage.  Comparison  of  total  calculated 
sunspot  area  yielded  fit  result  R  squared  values  between  the  three  data  sources  of 
0.749  (Holloman  to  SDO),  0.778  (Holloman  to  SWPC),  and  0.780  (SDO  to  SWPC). 
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SDO  total  calculated  area  was  generally  less  than  area  reported  by  SWPC  and 
Holloman.  Comparison  of  total  number  of  groups  between  the  three  data  sources 
yielded  fit  result  R  squared  values  of  0.705  (Holloman  to  SDO),  0.703  (Holloman  to 
SWPC),  and  0.722  (SDO  to  SWPC).  SDO  reported  number  of  groups  was  generally 
more  than  those  reported  by  Holloman  and  SWPC.  Comparison  of  total  number  of 
sunspots  detected  yielded  fit  result  R  squared  values  of  0.720  (Holloman  to  SDO), 
0.712  (Holloman  to  SWPC),  and  0.738  (SDO  to  SWPC).  The  total  number  of  spots 
determined  by  the  SDO  code  was  significantly  higher  than  either  Holloman  or 
SWPC. 

By  comparing  positions  of  detected  sunspot  groups,  McIntosh  classification 
results  are  shown  with  differences  comparable  to  the  difference  between  Holloman 
and  SWPC,  illustrated  through  a  three  way  analysis  of  weighted  biased  variation. 
The  classification  accuracy  was  partitioned  into  three  separate  categories:  direct, 
intermediate  and  relaxed.  While  the  direct  comparison  yielded  low  results  between 
all  three  reporting  entities,  the  relaxed  metric  of  comparison  showed  uncorrected 
match  percentages  of  87.67%  for  SWPC  to  Holloman,  83.80%  for  SWPC  to  SDO, 
and  80.65%  for  SDO  to  Holloman.  Comparing  SWPC  to  Holloman,  The  algorithm 
was  developed  on  a  test  set  of  images,  spanning  approximately  six  months  from  6 
July  2012  through  31  December  2012,  to  provide  comparable  results  to  our  standard 
sources.  Subsequently  run  on  an  additional  6  months  of  data  from  1  January  2013 
to  29  June  2013  to  show  consistency,  the  algorithm  yielded  similar  results  to  the 
first  6  months  tested. 

After  establishing  the  accuracy  of  the  detection  and  classification  method, 
parameters  were  perturbed  by  incremental  amounts  to  show  variation  in  the  match 
percentages  compared  to  our  two  standard  sources.  It  was  shown  that  match 
percentages  reach  maximum  values  for  particular  values  of  classification  cutoffs. 
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Parameters  relating  to  group  length,  spot  length  and  spot  eccentricity  were  tested 
to  determine  the  bias  associated  with  each  item  for  Holloman  AFB.  For  group 
length,  it  was  shown  that  compared  to  the  regular  5°,  10°,  and  15°  cutoffs  for  ‘D’, 
‘E’,  and  ‘F’  class  sunspots,  Holloman  biased  their  classifications  to  align  with  group 
lengths  of  7.5°,  12.5°,  and  17.5°.  For  spot  length  compared  to  2.5°,  the  bias  was 
found  to  be  greater  than  3.9°.  For  spot  eccentricity  compared  to  a  standard  defined 
to  be  0.5,  the  bias  was  found  to  be  closer  to  an  eccentricity  of  0.8. 

5.2  Future  Work 

Further  work  should  be  done  to  demonstrate  the  stability  of  the  SDO  code 
with  respect  to  detection  and  grouping.  This  can  be  addressed  by  operating  the 
code  on  a  higher  cadence  of  data  to  produce  time  dependent  values  of  sunspot  area, 
number  of  spots,  number  of  groups,  spot  length,  McIntosh  classification,  etc.  This 
may  uncover  inaccuracies  that  could  be  contributing  to  classification  errors,  most 
likely  in  the  spot  finding  stage.  If  found,  more  work  can  be  done  to  improve  the 
stability  of  the  drawing  process.  In  addition,  more  work  should  be  done  to  look  into 
the  predictive  capabilities  of  the  code. 

Alternate  processes  for  sunspot  grouping  and  classification  have  been  explored 
as  outlined  by  Aschwanden  [ Aschwanden ,  2010].  These  processes  should  be  explored 
for  potential  expansion  of  the  SDO  code  as  well  as  a  comparison  between  the 
different  methods.  New  methods  for  grouping  detected  sunspots  may  significantly 
improve  error.  Watershed  or  neural  networking  methods  may  be  able  to  serve  this 
purpose  while  the  original  purpose  of  the  SDO  code  is  preserved.  Additionally, 
incorporating  region  number  data  into  the  grouping  process  may  significantly 
improve  classification  match  percentages  by  ensuring  the  propper  groups  are 
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matched.  Determining  a  better  way  to  match  groupings  between  the  SDO  output 
and  SWPC  or  Holloman  may  have  a  large  effect. 

A  more  in  depth  look  at  the  second  and  third  letter  accuracy  may  show  that 
these  sections  of  code  need  revision.  The  comparison  metric  selected  for  use  in  this 
research  specifically  reflected  a  direct  comparison  as  well  as  broader  categories  that 
mostly  favored  an  evaluation  of  the  Zurich  classification.  These  specifics  should  be 
addressed  to  ensure  complete  accuracy  for  all  of  the  three  letters.  The  type  of 
comparison  used  in  this  research  can  still  be  applied  for  other  comparison  metrics. 
Other  similar  metrics  may  therefore  be  developed  to  look  at  these  additional 
features  in  the  McIntosh  classification. 

A  mechanism  for  establishing  AF  observatory  bias  has  been  demonstrated,  but 
this  concept  can  be  readily  applied  to  all  sunspot  parameters  and  Learmonth  and 
San  Vito  observatories.  A  more  in  depth  study  of  the  bias  should  also  be  applied  to 
Holloman  as  this  can  lead  to  a  more  precise  determination  of  the  bias  in  addition  to 
a  quantification  of  some  additional  parameters.  There  is  also  a  potential  to 
automate  this  bias  evaluation  given  the  code’s  capacity  to  operate  using  several 
different  patterns  at  once. 

Additional  tests  to  improve  the  statistical  significance  of  the  code  result 
should  be  accomplished.  While  the  SDO  satellite  has  only  been  in  use  since  2010, 
the  general  nature  of  the  code  allows  for  adaptation  to  other  data  sources  including 
SOHO  imagery.  This  may  enable  easier  comparison  to  other  automated 
classification  schemes  that  may  have  been  designed  to  function  other  sources  of  data. 
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Appendix  1.  Holloman  Classification  Method 


A.l  From  Drawing  to  Classification 

The  AFMAN15-124  sets  standards  for  the  reporting  of  classified  sunspot 
groups  that  mirror  guidelines  for  combined  Mclntosh/Mt.  Wilson  classification 
schemes  [USAF,  2013].  All  hand  classifications  are  carried  out  under  the  guidelines 
set  by  both  the  Air  Force  manual  and  the  schoolhouse  at  Holloman  AFB,  but 
interpretation  of  those  guidelines  often  cause  different  opinions  on  the  classification 
of  even  the  simplest  spot  groups.  Therefore,  it  was  decided  to  contrast  the  method 
of  automatic  classification  with  the  method  of  hand  classification  [USAF,  2013]  over 
a  direct  comparison  between  the  two,  as  there  can  be  large  dissent  between 
observers  about  the  classification  of  a  particular  spot  group  (even  after  published 
spot  reports  are  referenced).  This  difference  are  highlighted  when  comparing 
Holloman  to  SWPC  in  Section  4.4.1. 

A. 2  Drawing  Procedures 

The  solar  analyst  starts  the  drawing  process  by  initiating  the  java  program 
scripted  to  communicate  with  the  SOON  telescope  used  for  continual  observation  of 
the  sun.  The  SOON  telescope  is  used  at  every  Air  Force  Optical  Solar  Observatory. 
This  java  program  uses  the  Zulu  time,  calculated  from  the  computer’s  internal  clock 
which  may  need  to  be  adjusted  by  the  observer,  to  pull  important  information  from 
the  solar  ephemeris.  Specifically,  the  solar  analyst  needs  both  the  P  and  the  B  angle 
to  perform  the  drawing.  The  P  angle  is  the  sun’s  rotation  around  the  center  of  the 
disk  with  respect  to  the  earth  while  the  B  angle  is  the  sun’s  tilt  either  towards  or 
away  from  the  earth  [Seidelmann  and  Urban,  2010].  These  angles  are  simply  a 
consequence  of  our  orbit  around  the  sun  coupled  with  the  tilt  in  our  axis  of  rotation 


and  the  sun’s  tilt  in  axis  of  rotation.  They  are  necessary  to  calculate  in  order  to  put 
all  observers  at  the  same  reference  angle  with  solar  north  at  the  top  of  the  drawing. 
Once  recorded  on  the  Air  Force  Weather  Agency  (AFWA)  Form  21,  the  solar 
analyst  takes  the  form  to  the  white  light  board  on  the  SOON  telescope. 


Figure  25.  AFWA  form  used  to  record  the  position  of  sunspots  and  code  in  the  sunspot 
report  from  an  AF  Solar  Observatory.  This  form  is  placed  on  the  drawing  board  of  the 
SOON  telescope  and  all  spots  are  recorded  within  the  18  cm  diameter  circle  on  the 
left  hand  side.  Then,  the  classification  process  is  completed  and  recorded  on  the  right 
hand  side  before  coding  into  a  sunspot  report. 


The  Form  21  is  an  elongated  sheet  of  paper  incorporating  both  a  section  for  drawing 
sunspots  projected  onto  the  white  light  board  inside  an  18  cm  circle,  as  well  as  a 
section  for  encoding  the  observation  into  a  spots  message  after  the  drawing  has  been 
completed.  In  order  to  correct  for  the  P  angle  rotation,  the  analyst  uses  the 
rounded  number  taken  from  the  observing  program  and  marks  the  hatch  mark  on 
the  side  of  the  drawing  disk  corresponding  to  the  proper  P  angle.  On  the  Western 
limb  of  the  disk  on  the  Form  21,  there  are  positive  and  negative  rotation  angles  to 
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which  the  analyst  must  attempt  to  match  the  P  angle  to  the  best  of  his  or  her 
ability,  and  the  angles  along  the  side  of  the  drawing  go  to  positive  and  negative  30 
as  the  P  angle  can  range  between  ±26.4°  visible  in  figure  25.  From  this  point,  the 
analyst  places  the  paper  onto  the  white  light  board,  orienting  the  paper  with  the 
line  through  the  appropriate  P  angle,  perpendicular  to  the  base  of  the  drawing 
board.  By  rotating  the  paper  on  the  drawing  board  in  this  way,  the  P  angle  rotation 
is  corrected,  and  no  rotation  about  the  center  of  the  disk  is  present  in  the  drawing, 
at  least  to  the  accuracy  that  the  Analyst  can  orient  the  drawing  on  the  board. 

The  image  is  subsequently  focused  on  the  drawing  board.  Because  of  seeing 
conditions  varying  throughout  the  day,  as  well  as  variation  between  days,  it  may  be 
necessary  to  re-focus  the  drawing  board  to  get  the  most  accurate  representation  of 
the  spot  region  being  drawn.  Often  times,  the  seeing  conditions  will  prevent  a  sharp 
image  from  being  formed,  but  it  is  the  analyst’s  job  to  get  the  best  picture  possible 
to  most  accurately  represent  the  sun  at  the  observation  time.  This  focusing 
endeavor  is  performed  by  sliding  the  board  along  a  rail  that  runs  the  length  of  the 
telescope  optics. 

Because  the  seeing  conditions  and  the  heat  index  of  the  day  will  alter  the  focal 
length  of  the  mirror,  the  analyst  adjusts  the  board  to  sit  at  the  approximate 
location  of  the  focal  length.  Depending  on  the  change  in  conditions  from  the 
previous  drawing,  it  may  be  unnecessary  to  re-focus  the  image.  The  next  step  for 
the  analyst  to  take  is  to  record  the  start  time  of  the  drawing  on  the  bottom  of  the 
form  (there  is  no  line  on  the  form  for  this  recording).  Both  the  start  and  stop  time 
of  the  drawing  will  be  averaged  in  order  to  determine  the  approximate  time  the 
drawing  was  done.  This  information  isn’t  necessarily  important  for  SWPC,  but  is 
important  in  order  to  compare  the  Observatory  method  to  any  other  spot 
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Figure  26.  A  technician  stands  in  front  of  the  drawing  board  on  the  Holloman  SOON 
telescope.  White  light  is  reflected  down  the  length  of  the  telescope  to  the  45°  mirror. 
This  image  is  then  reflected  down  on  the  drawing  board  where  solar  analysts  record 
all  the  sunspots  visible  on  the  AFWA  form  21  from  Figure  25.  The  guide  instrument 
is  visible  above  the  45°  mirror  while  the  rail  along  which  the  drawing  board  can  slide 
can  be  seen  with  a  measuring  tape  running  its  length. 


classification  method  as  the  observation  time  needs  to  be  synchronized  to  minimize 
error  that  may  be  associated  with  timing. 

After  the  time  is  recorded,  the  analyst  begins  the  drawing.  Before  actually 
putting  pencil  to  paper,  a  high  gloss  sheet  of  paper  is  held  by  the  analyst  and 
moved  in  swift,  tight  circles  over  the  top  of  the  Form  21.  The  analyst  uses  this 
technique  in  order  to  visibly  distinguish  between  spots  visible  on  the  sun  and  any 
paper  defects  on  the  form.  If  a  defect  is  seen,  a  small  “x”  is  placed  over  the  mark  in 
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order  to  prevent  confusion  later  on  when  determining  the  classification  of  each  spot 
group.  Next,  using  dabbing  strokes  in  order  to  represent  each  spot,  the  analyst 
outlines  each  spot  region  containing  penumbra,  and  fills  in  each  umbra  using  a 
pencil.  This  dabbing  motion  is  adopted  because  the  object  being  outlined  is 
reflected  off  the  mirror  directly  above  the  paper.  When  drawing,  the  analyst’s  pencil 
generally  will  cast  a  shadow  over  the  region  he  or  she  is  currently  addressing,  and 
this  prevents  the  outline  of  the  spot  group  from  being  seen  while  actually  drawing 
on  the  paper.  In  addition  to  this  projective  hindrance,  the  atmospheric  conditions 
of  the  day  may  also  cause  turbulence  that  shakes  the  projection  of  the  spots  on  the 
white  light  board.  Therefore,  the  analyst  must  do  their  best  to  accurately  represent 
a  moving  target  on  the  stationary  paper.  This  vibration  effect  is  considerable  near 
the  edge  of  the  drawing  as  small  variation  on  the  paper  in  that  region  corresponds 
to  larger  variation  in  longitude  or  latitude.  A  quick  calculation  shows  that 
variations  of  a  single  millimeter  in  the  East/West  direction  on  the  drawing  board 
correspond  to  a  minimum  longitude  error  of  0.3°  in  the  center  of  the  disk  out  to  5.5° 
error  on  the  limb,  illustrating  that  this  may  be  a  significant  source  of  error  in  the 
drawing  process.  Because  of  these  atmospheric  and  vibrational  effects,  a  dabbing 
motion  tends  to  work  the  best  in  order  to  correctly  draw  all  the  spots  on  the  sun. 
After  each  spot  is  drawn,  the  high  gloss  paper  is  used  again  to  check  how  well  each 
spot  has  been  represented  on  the  form.  Any  errors  are  erased  and  re-drawn  on  the 
form  before  moving  the  next  step. 

A. 3  Magnetic  Maps 

In  general,  an  analyst  can  visually  group  spots  they  have  drawn  on  the  form 
solely  based  on  their  proximity  to  other  spots.  However,  it  may  sometimes  become 
necessary  to  choose  the  groupings  of  spot  regions  based  off  the  magnetic  polarity  of 
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those  regions.  In  this  case,  the  analyst  must  run  a  polarity  diagnostics  program, 
generally  referred  to  as  the  “magnetic  map”  of  the  region  in  question,  in  order  to 
determine  the  magnetic  polarity  of  that  region  and  the  surrounding  spots  [USAF, 
2013].  This  is  usually  done  anyway  because  the  magnetic  field  lines  are  often  needed 
to  perform  the  Mt.  Wilson  classification  (unused  in  this  research).  The  diagnostic 
tool  takes  advantage  of  the  fact  that  light  coming  from  regions  of  opposite  magnetic 
polarity  will  be  polarized  in  reverse  directions.  The  analyst  therefore  uses  a 
polarized  lens  placed  on  the  front  of  the  SOON  telescope  to  determine  the  polarity 
of  light  coming  from  each  region.  Regions  usually  take  about  5  minutes  to  process, 
so  the  magnetic  map  is  generally  run  while  performing  the  drawing  in  order  to  best 
capture  the  magnetic  field  configuration  of  the  spot  groups  when  the  drawing  was 
completed.  This  process  can  be  lengthy,  however,  sometimes  spanning  over  an  hour 
depending  on  how  many  active  regions  need  to  be  processed.  In  addition,  the 
diagnostic  tool  can  only  accommodate  up  to  6  regions  at  a  time.  If  more  than  6 
active  regions  are  visible  on  the  sun,  obtaining  a  magnetic  map  for  each  becomes 
very  time  consuming.  Because  each  map  is  accomplished  separately,  these  magnetic 
field  representations  obtained  by  the  program  are  not  synchronized  in  time,  nor  do 
they  line  up  with  the  observation  time  at  which  the  drawing  was  performed.  This 
fact  is  important  to  remember  when  actually  performing  the  classification  as  the 
magnetic  field  lines  may  not  actually  line  up  with  the  outline  of  a  spot  group, 
potentially  introducing  a  source  of  error  in  the  determination  of  the  polarity  of  a 
spot.  It  should  also  be  mentioned  that  small  groups  are  generally  not  analyzed 
using  a  magnetic  map.  This  means  that  the  magnetic  polarity  of  these  regions  is 
sometimes  assumed  and  not  actually  determined  using  the  telescope.  When 
matching  group  classifications  later  on  therefore,  it  is  possible  that  some  groups 
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were  assumed  to  be  unipolar  or  bipolar  because  they  were  small,  but  the  true 
polarity  may  be  different. 

A. 4  Positions  and  Grouping 

After  the  completion  of  the  magnetic  map,  the  analyst  can  move  into  grouping 
each  spot.  Groupings  of  this  nature  are  determined  based  on  the  McIntosh  paper, 
outlined  again  in  the  AFMAN15-124  [  USAF ,  2013],  although  the  solar  analyst 
generally  uses  these  same  guidelines  when  grouping  by  simple  observation  even 
though  the  groups  may  be  obvious  to  a  seasoned  observer.  This  grouping  technique 
is  performed  by  first  collecting  all  the  spots  within  3  heliographic  degrees  from  a 
leader  spot  and  subsequently  accepting  spots  generally  no  more  than  20 
heliographic  degrees  from  the  leader  spot,  depending  on  the  polarity  of  any  of  the 
trailing  spots.  Visually,  the  analyst  makes  these  determinations  and  can  separate 
groups  that  are  close  in  proximity  with  dashed  lines  on  the  drawing,  making  it 
easier  to  look  back  later  and  understand  the  decisions  made  in  the  classification 
step.  In  addition,  SWPC  will  sometimes  direct  observation  of  specific  boxes  on  the 
Sun  where  there  are  active  regions.  This  direction  provided  to  the  analyst  on  duty 
may  also  influence  the  sunspot  groupings. 

Once  the  groups  have  been  determined,  the  analyst  moves  into  determining 
the  position,  length  and  area  of  each  spot  group.  Generally,  the  position  of  each 
spot  is  determined  first  using  a  transparency  with  a  printed  grid  on  the  front,  shown 
in  Figure  27.  This  grid  must  be  angled  with  the  center  of  the  grid  at  a  higher  and 
higher  latitude  for  decreasing  B  angle.  If  the  B  angle  is  increasing,  the  layover  can 
be  flipped  upside  down  for  the  same  effect,  doubling  the  usefulness  of  each  layover. 
This  angling  is  because  as  the  spherical  sun  tilts  either  towards  or  away  from  our 
satellite,  the  zero  mark  for  latitude  will  no  longer  be  in  the  center  of  the  disk. 
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Figure  27.  Latitude  and  longitude  overlay  used  by  Holloman  Air  Force  Base  to  deter¬ 
mine  the  approximate  position  of  sunspots  drawn  on  the  18  cm  disk  represented  on  the 
AFWA  form  21  in  Figure  25.  This  particular  layover  is  only  applicable  for  drawings 
performed  when  the  B  angle,  or  the  angle  describing  the  tilt  of  the  sun  towards  or  away 
from  the  observer,  is  rounded  to  ±7°.  For  other  cases,  separate  latitude  and  longitude 
layovers  are  available  in  one  degree  increments,  noting  that  each  one  works  for  both 
positive  and  negative  tilt. 


Shifting  the  grid  either  up  or  down  will  correct  for  this  fact.  The  grid  used  by 
analysts  has  a  spacing  between  lines  of  10  heliographic  degrees,  meaning  that 
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greater  accuracy  requires  mental  interpolation  performed  by  the  analyst.  In 
addition  to  this  interpolation  requirement,  it  may  sometimes  be  difficult  to 
determine  where  the  center  of  the  group  is  on  the  disk.  Since  the  longitude  and 
latitude  of  the  spot  group  is  reported  for  the  centroid  of  the  group  only,  this 
determination,  coupled  with  the  10  degree  separation  of  grid  lines  on  the  18 
centimeter  wide  drawing  can  lead  to  some  significant  error  in  the  determination  of 
the  position  of  the  group.  The  length  of  each  group  is  determined  off  the  same 
layover  by  simply  picking  the  end  of  the  East- West  extent  of  the  group  and 
comparing  it  to  the  other  end.  Again,  if  the  end  of  the  group  does  not  line  up  with 
a  grid  line,  the  Analyst  will  interpolate  the  value  of  both  the  longitude  and  latitude 
to  get  the  approximate  location  of  the  end  of  the  group.  Interpolations  for  both 
length  and  position  must  be  done  to  the  nearest  whole  degree.  Positions  of  each 
group  are  written  in  adjacent  to  each  spot  group  with  lines  pointing  to  the  location 
on  the  drawing  that  the  analyst  has  determined  to  be  the  center  of  the  group. 

A. 5  Area  Determination 

After  the  length  and  position  of  each  group  has  been  determined,  the  analyst’s 
next  step  is  to  determine  the  approximate  area  of  each  group  using  a  layover  similar 
to  the  longitude/latitude  situation.  In  this  case,  the  transparent  layover  has 
representations  of  different  shapes  of  spots  as  drawn  on  an  18  cm  diameter  disk 
matching  the  one  printed  on  the  Form  21,  seen  in  Figure  28.  These  spots  are 
representations  of  best  fit  ellipses  encompassing  the  proper  amount  of  area  in  units 
of  millionths  of  a  solar  hemisphere.  At  this  point,  the  analyst’s  job  is  to  determine 
which  spot  outline  on  the  layover  best  fits  the  spot  they  have  drawn  on  the  page, 
annotating  on  the  form  the  corresponding  area  of  that  fitted  spot  outline  in 
millionths  of  a  solar  hemisphere.  For  smaller  spot  groups,  the  size  of  outlined  spots 
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on  the  layover  increases  slightly  by  10  millionths  of  a  solar  hemisphere.  However,  for 
larger  groups,  the  difference  between  one  size  and  the  next  step  up  can  be  as  great 
as  200  millionths  of  a  solar  hemisphere.  If  a  spot  does  not  fit  completely  within  the 
layover,  analysts  often  use  a  series  of  smaller  ellipses  and  add  them  up  to  arrive  at  a 
final  area  value  for  the  group. 
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Figure  28.  Area  overlay  used  by  Holloman  Air  Force  Base  to  determine  the  approximate 
area  of  the  sunspots  drawn  on  the  18  cm  disk  represented  on  the  AFWA  form  21  in 
Figure  25.  Transparency  is  placed  over  the  form  and  the  best  fit  circle  is  determined 
visually.  The  corresponding  area  is  recorded  before  being  multiplied  by  the  geometric 
foreshortening  correction  factor  from  Figure  29. 
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After  the  approximate  area  is  determined  using  the  area  overlay,  the  solar 
analyst  uses  another  overlay  to  finalize  the  reported  area.  This  overlay  displays  a 
geometric  foreshortening  factor  based  on  position  away  from  the  center  of  the  disk. 
While  not  certain,  it  is  believed  that  the  layover  was  created  by  using  a  co^ 
relationship.  This  factor  is  multiplied  by  the  area  result  obtained  from  the  area 
overlay.  The  solar  analyst  determines  the  approximate  position  of  the  center  of  the 
sunspot  group  and  finds  the  corresponding  mark  for  foreshortening  correction  on  the 
layover,  seen  in  Figure  29.  If  the  approximate  position  lies  between  two  markers  on 
the  foreshortening  overlay,  the  observer  rounds  in  towards  the  center  of  the  overlay 
to  get  the  number  to  multiply  with  the  area  from  the  previous  step. 

In  this  way,  the  group  area  is  approximated  based  on  the  visual  observations 
of  each  analyst.  Once  these  preliminary  steps  are  completed,  the  solar  analyst  can 
begin  classifying  each  spot  group.  These  classifications  are  completed  using  the 
guidelines  developed  by  McIntosh  and  restated  in  the  AFMAN15-124  [ USAF ,  2013]. 
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Figure  29.  Foreshortening  overlay  used  by  Holloman  Air  Force  Base  to  determine  the 
approximate  position  of  sunspots  drawn  on  the  18  cm  disk  represented  on  the  AFWA 
form  21.  After  determining  the  area  of  the  sunspot  group  using  the  layover  in  Figure 
28,  the  resulting  number  is  multiplied  by  the  foreshortening  factor  obtained  on  this 
layover.  If  a  spot  group  lies  in  the  center  of  two  markers,  the  number  closer  to  the 
center  of  the  layover  is  used  without  interpolating.  Additionally,  it  is  important  to  note 
that  the  foreshortening  factor  does  not  increase  over  a  value  of  3  for  high  longitudes. 
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Appendix  2.  Derivation  of  Longitude  and  Latitude 

Coordinates 


To  derive  the  relationship  between  pixels  in  an  image  and  points  on  the  solar 
sphere,  a  two  dimensional  picture  is  addressed  first.  The  derivation  of  this  result 
was  directed  by  the  National  Solar  Observatory  (NSO)  and  can  be  seen  in  Smart’s 
Text  on  Spherical  Astronomy  [Smart,  1947]. 


Figure  30.  Projecting  the  line  of  sight  between  the  SDO,  at  a  distance  R  from  C,  and 
point  L  onto  some  point  T  on  the  two  dimensional  plane  where  TSDO  is  tangent  to  the 
edge  of  the  sun,  and  point  A  at  some  point  interior  to  the  edge  of  the  sun  onto  some 
point  S,  two  triangles  are  formed.  Using  similar  triangles  described  by  angles  p8  and 
pi,  a  relationship  is  drawn  to  find  the  angle  between  the  center  of  the  sun  with  respect 
to  the  observation  point  of  the  SDO  and  the  point  A  at  which  any  given  spot  on  the 
surface  of  the  sun  exists.  This  angle  p  is  later  used  to  determine  both  the  longitude 
and  latitude  of  the  spot  at  point  A  on  the  sun. 


First,  an  angle  ps  =  tan  1  ^  is  defined  where  r0  =  k[m^^ls]N  illustrated  in  Figure 
30.  N  is  defined  to  be  the  radius  of  the  sun  in  pixels.  Likewise,  p\  =  tan-1(^|) 
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where  ra  =  and  M  is  defined  to  be  the  pixels  from  the  center  point  C  to 

S.  In  this  case,  k  is  simply  a  constant  with  the  appropriate  units.  By  dividing  the 
two  expressions  into  each  other,  an  expression  between  p±  and  ps  is  obtained  where 
R,  the  distance  between  C  and  the  SDO  in  Figure  30,  cancels: 


tan(ps)  N 
tan(pi)  M 


(V) 


Using  the  law  of  sines,  an  expression  relating  pi  and  ps  to  p  is  determined: 


sin(180  —  pi  —  p)  sin(p!  +  p)  sin(p1) 
R  R  r 


(18) 


P 


Figure  31.  In  looking  at  the  sun  from  the  SDO  perspective,  the  base  legs  of  the  two 
like  triangles  are  seen  from  Figure  30  in  lengths  SC  and  TC.  Angle  ZSCP  describes 
9 ,  and  in  this  figure,  the  P  angle  has  been  taken  out  as  all  SDO  images  are  without  P 
angle. 


Ill 


Next,  using  the  first  relation  obtained  in  Equation  17,  an  expression  relating 
pi  and  ps  to  the  distance  to  S  from  C  in  both  x  and  y  coordinates  is  solved.  This  is 
done  through  simple  trigonometric  relations  and  the  Pythagorean  Theorem: 


tan(Pl)  =  taiy*)  \J VJ  +  M*. 
ta„(p,)  =  M,  =  tMe) 


(19) 

(20) 


ta  n(p„)  Ml 

where  9  is  represented  in  Figure  31.  Next,  noticing  triangle  AOP  on  the  sphere, 
shown  in  Figure  32,  with  side  lengths  p,  90  —  B,  90  —  Bq,  the  spherical  law  of  sines 
is  used  to  arrive  at  the  following  relationship: 


siii™-lo)  =  2((0-  pj  =>  sm(p)  mn(ff  -  Po)  =  Sin(L  ~  Lo)  C0S(B)  (21) 

Next,  variables  in  terms  of  known  values  (includeing  p,  9 ,  and  the  B  angle)  are 
isolated  for  substitution.  Therefore,  two  separate  cases  are  considered  to  which  the 
spherical  law  of  cosines  can  apply.  The  Erst  is  for  any  point  visible  on  the  disk  when 
the  distance  between  O  and  P  is  less  than  90  degrees,  shown  in  Figure  32.  From  the 
law  of  cosines 


cos(p)  =  cos(90  —  B)  cos(90  —  B0)  +  sin(90  —  B )  sin(90  —  B0)  cos (L  —  L0)  (22) 

so  that  substituting  in  for  9  —  P0  gives  an  expression  for  both  longitude  and  latitude 
with  no  other  unknown  variables.  This  process  is  similar  for  the  second  case  where 
the  distance  between  O  and  P  is  greater  than  90  degrees,  resulting  in  the  final 
expression  for  relating  all  three  known  components  and  the  longitude  and  latitude. 
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Figure  32.  The  three  dimensional  projection  of  the  two  dimensional  sun  is  represented 
in  terms  of  longitude  (seen  in  red)  and  latitude  (seen  in  blue).  For  some  sunspot  at 
point  A  on  the  two  dimensional  sun,  both  longitude  and  latitude  can  be  found  by 
determining  their  relationship  to  p  (described  as  the  angle  between  O  and  A  from  an 
observer  off  the  sphere  on  a  ray  from  C  through  O)  and  9  (described  as  ZACO)  through 
the  spherical  law  of  sines.  Next,  the  spherical  law  of  cosines  is  applied  to  retrieve  each 
variable  in  terms  of  the  center,  radius,  and  angle  off  center,  all  found  through  the  edge 
finding  routine  applied  in  the  beginning  stages  of  the  code. 
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cos (B)  cos (L  —  L0)  =  [cos(-Bq)  cos(p)  ±  sin(50)  sin(p)  cos {6  —  P0)]  (23) 


After  defining  each  of  these  relationships,  it  is  now  possible  to  take  points  in 
the  Cartesian  plane  of  the  image  and  place  them  onto  the  three  dimensional  sphere 
of  the  sun.  Some  additional  logic  is  required  to  ensure  that  the  proper  expression  is 
being  used  in  any  given  situation  because  of  the  cosine  ambiguities  involved  in  the 
final  solution,  but  this  is  simply  incorporated  into  the  mapping  function.  In  this 
way,  given  a  point  on  the  image  and  the  time  at  which  the  image  was  taken,  both 
the  solar  ephemeris  calculations  and  the  point  mapping  function  are  combined  to 
get  a  position  in  longitude  (shown  in  red)  and  latitude  (shown  in  blue),  visible  in 
Figure  32.  It  should  be  noted  that  the  illustration  of  the  point  mapping  scheme 
shown  in  Figure  32  are  left  in  the  most  general  form.  Published  SDO  images 
[Pesnell  et  al.,  2012;  Schou  et  al.,  2012;  Scherrer  et  al.,  2012]  are  already  corrected 
for  the  P  angle  (shown  in  orange)  making  this  value  zero  in  these  calculations. 
However,  the  B  angle  must  still  be  corrected  for,  illustrated  in  purple  in  Figure  32. 
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Appendix  3.  Code  Test  on  an  Additional  6  Months 


In  order  to  generate  additional  confidence  that  the  accuracy  of  the  SDO  code 
could  be  applied  to  all  potential  sunspot  configurations,  the  algorithm  was  run  on 
an  additional  6  months  of  data,  spanning  from  1  January  2013  to  29  June  2013. 
This  additional  test  was  run  to  show  accuracy  in  reported  area,  number  of  groups, 
number  of  sunspots,  and  classification. 

The  same  analysis  seen  in  Chapter  IV  can  be  applied  to  these  additional  plots 
and  they  are  included  for  completeness. 


Figure  33.  The  difference  between  the  daily  total  area  calculated  by  each  of  the  three 
reporting  entities  is  plotted  from  1  Jan  -  29  Jun  2013. 


The  difference  between  reported  area  calculations  follows  suit  with  the 
discussion  in  Chapter  IV.  There  are  no  reported  areas  of  significant  difference  and 
some  of  the  same  trends  are  visible.  Specifically,  Holloman  tends  to  have  an  area 
lower  than  that  of  either  SWPC  or  the  SDO,  although  this  is  not  always  the  case  as 
is  seen  in  Figure  33.  The  regression  lines  seen  in  the  subsequent  scatter  plots  further 
demonstrate  the  area  accuracy  achieved  by  the  SDO  code.  Figure  34  shows  a 
regression  line  slope  of  0.84  tilted  towards  the  Holloman  axis  and  an  R  squared 
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Figure  34.  Regression  of  SWPC  and  Holloman  total  area  from  1  Jan  -  29  Jun  2013. 


Figure  35.  Regression  of  SWPC  and  SDO  total  area  from  1  Jan  -  29  Jun  2013. 
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Figure  36.  Regression  of  SDO  and  Holloman  total  area  from  1  Jan  -  29  Jun  2013. 


value  of  0.753.  Figure  35  demonstrates  that  the  SDO  code  reports  consistent  areas 
coming  in  with  a  slope  value  of  1.04  and  a  high  R  square  value  of  0.8116.  The  final 
area  plot  in  Figure  36  shows  that  Holloman  areas  are  generally  higher  than  SDO 
areas  for  this  set  of  test  data  with  a  regression  slope  of  0.96  and  a  comparatively  low 
R  squared  value  of  0.643. 

Differences  between  reported  group  numbers  are  also  consistent  with  the  test 
data  shown  previously.  Separation  between  the  difference  lines  in  Figure  37  is  not 
anomalous  compared  to  the  group  separations  seen  previously.  Figure  38  shows  a 
regression  slope  of  0.88  and  good  agreement  on  the  R  squared  value  between  SWPC 
and  Holloman.  Number  of  groups  for  the  SDO  SWPC  comparison  is  skewed  to  0.67 
with  a  low  R  squared  value  in  Figure  39,  indicating  less  agreement  than  has  been 
previously  seen.  The  final  item  in  this  section,  Figure  40,  shows  slightly  better  R 
squared  agreement  at  0.627  but  a  more  promising  regression  slope  of  0.88. 
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Figure  37.  The  difference  between  the  daily  total  number  of  groups  determined  by 
each  of  the  three  reporting  entities  is  plotted  from  1  Jan  -  29  Jun  2013. 


Figure  38.  Regression  of  SWPC  and  Holloman  total  number  of  groups  from  1  Jan  -  29 
Jun  2013. 


Comparing  the  number  of  spots  calculated  by  the  SDO  code  again  reveals  the 
great  advantage  of  spatial  resolution  in  Figure  41.  Significant  separation  exists  for 
days  where  large  numbers  of  spots  are  seen  on  the  disk.  Figure  42  shows  the  slope 
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Figure  39.  Regression  of  SWPC  and  SDO  total  number  of  groups  from  1  Jan  -  29  Jun 
2013. 


of  regression  fit  tilted  in  no  particular  direction  with  a  good  slope  agreement  of  0.99 
and  an  R  squared  value  of  0.739.  Figure  43  on  the  other  hand  shows  significant  tilt 
towards  the  SDO  axis  as  expected  with  a  good  R  squared  fit  value  of  0.747.  Finally, 
the  regression  in  Figure  44  demonstrates  the  same  feature  with  tilt  of  0.38  towards 
the  SDO  axis  again.  The  R  squared  value  for  this  regression  is  fair  at  0.704. 
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Figure  40.  Regression  of  SDO  and  Holloman  total  number  of  groups  from  1  Jan  -  29 
Jun  2013. 


Figure  41.  The  difference  between  the  daily  total  number  of  sunspots  detected  by  each 
of  the  three  reporting  entities  is  plotted  from  1  Jan  -  29  Jun  2013. 
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Figure  42.  Regression  of  SWPC  and  Holloman  total  number  of  sunspots  from  1  Jan  - 
29  Jun  2013. 


Figure  43.  Regression  of  SWPC  and  SDO  total  number  of  sunspots  from  1  Jan  -  29 
Jun  2013. 
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Figure  44.  Regression  of  SDO  and  Holloman  total  number  of  sunspots  from  1  Jan  -  29 
Jun  2013. 
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Appendix  4.  Alternate  Image  Processing  Techniques 


There  are  many  other  image  processing  techniques  that  can  be  employed 
within  the  SDO  coding  framework.  Any  process  that  accomplishes  a  coding  step 
yielding  the  same  or  a  similar  result  can  be  considered,  especially  if  that  new  step  is 
faster  or  more  accurate.  Many  different  types  of  techniques  were  considered 
throughout  the  development  of  the  code,  but  often  times,  a  functional  method  was 
selected  for  simplicity  and  not  necessarily  optimization. 

To  illustrate,  an  example  alteration  is  executed  addressing  the  limb  darkening 
correction  step.  The  current  code  steps  involve  the  addition  of  an  inverted 
correction  image  (derrived  from  the  Eddington  appriximation)  to  the  grayscale  sun 
in  order  to  flatten  out  the  intensity  levels  of  the  sun.  This  process  is  described  in 
depth  in  Section  3.3.3. 

An  alternative  way  to  complete  this  step  would  be  to  multiply  a  correction 
function  into  the  grayscale  image  to  achieve  the  flattening.  The  theory  behind  this 
relates  to  the  concept  of  optical  depth.  The  derivation  of  the  Eddington  relation  (to 
the  first  order  approximation)  gives  the  intensity  profile  of  the  sun  for  varying 
angles  of  9  off  from  the  center  of  the  sun.  This  function  is  an  approximation  of  the 
intensity  drop  visible  on  the  photosphere,  but  if  it  is  inverted  and  multiplied  into 
the  grayscale  sun,  all  intensity  drop  off  with  angle  should  be  corrected  to  the  first 
order.  This  would  be  equivalent  to  having  a  uniform  intensity  and  multiplying  first 
by  some  intensity  profile  a  to  obtain  the  limb  darkened  sun  (the  starting  image)  and 
then  multiplying  by  the  -  in  order  to  re-scale  the  whole  multiplicative  factor  to  1. 

In  practice,  this  method  is  simple  enough  to  construct  because  the  limb 
darkening  function  is  already  known  to  first  order.  Figure  45  shows  the  results  of 
the  multiplication  of  the  inverse  Eddington  approximation  with  the  gray  sun  image 
before  any  correction  has  been  applied.  Figure  46  shows  the  limb  darkening 
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correction  obtained  through  the  addition  method  used  currently  in  the  code,  and 
there  is  some  clear  difference  on  the  limb  of  the  sun. 


Figure  45.  The  inverse  of  the  Eddington  approximation  function  is  multiplied  into  the 
grayscale  sun  to  correct  for  limb  darkening. 


Some  differences  are  clear,  including  a  faster  drop  off  on  the  multiplied  image 
near  the  very  edge  of  the  sun.  In  order  to  highlight  these  differences,  the  multiplied 
image  is  subtracted  from  the  added  image,  shown  in  Figure  47. 

The  difference  image  shows  some  higher  intensity  near  the  edge  of  the  sun, 
indicating  that  the  added  image  yields  higher  pixel  intensity  in  those  regions.  The 
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Figure  46.  The  inverse  of  the  Eddington  approximation  function  has  been  added  to  the 
grayscale  sun  in  this  case  to  correct  for  limb  darkening.  This  is  the  method  currently 
used  by  the  SDO  code. 

intensity  of  spots  in  those  regions  looks  to  be  visible  as  well,  indicating  that  there 
will  be  a  difference  in  the  spot  finding  if  the  multiplication  method  is  used.  The 
correction  may  improve  if  the  intensity  correction  function  is  found  to  a  higher 
order.  This  may  be  a  path  for  additional  research,  in  addition  to  other  image 
processing  techniques  that  may  be  used  to  speed  up  the  execution  of  the  code. 
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Figure  47.  The  multiplication  method  for  limb  darkening  correction  is  subtracted  from 
the  addition  method  used  by  the  SDO  code.  The  result  proves  to  have  higher  pixel 
intensity  near  the  edge  of  the  sun. 


126 


References 


Al-Omari,  M.,  R.  Qahwaji,  T.  Colak,  S.  Ipson,  and  C.  Balch,  Next-day  prediction  of 
sunspots  area  and  mcintosh  classifications  using  hidden  markov  models,  in 
CyberWorlds,  2009.  CW’09.  International  Conference  on,  pp.  253-256,  IEEE, 
2009. 

Aschwanden,  M.  J.,  Image  processing  techniques  and  feature  recognition  in  solar 
physics,  Solar  Physics,  262(2),  235-275,  2010. 

Benkhalil,  A.,  V.  Zharkova,  S.  Ipson,  and  S.  Zharkov,  Automatic  identification  of 
active  regions  (plages)  in  the  full-disk  solar  images  using  local  thresholding  and 
region  growing  techniques,  in  Proceedings  of  the  AISB,  vol.  3,  pp.  66-73,  2003. 

Benkhalil,  A.,  V.  Zharkova,  S.  Ipson,  and  S.  Zharkov,  Automatic  detection  of  active 
regions  on  solar  images,  in  Knowledge- Based  Intelligent  Information  and 
Engineering  Systems,  pp.  460-466,  Springer,  2004. 

Bornmann,  P.,  and  D.  Shaw,  Flare  rates  and  the  mcintosh  active- region 
classifications,  Solar  physics,  150(1-2),  127-146,  1994. 

Canny,  J.,  A  computational  approach  to  edge  detection,  Pattern  Analysis  and 
Machine  Intelligence,  IEEE  Transactions  on,  (6),  679-698,  1986. 

Colak,  T.,  and  R.  Qahwaji,  Automated  mcintosh-based  classification  of  sunspot 
groups  using  mdi  images,  Solar  Physics,  248(2),  277-296,  2008. 

Curto,  J.,  M.  Blanca,  and  E.  Martinez,  Automatic  sunspots  detection  on  full-disk 
solar  images  using  mathematical  morphology,  Solar  Physics,  250(2),  411-429, 
2008. 

De  Wit,  T.  D.,  Fast  segmentation  of  solar  extreme  ultraviolet  images,  Solar  Physics, 
239 ( 1-2),  519-530,  2006. 

Dclouillc,  V.,  B.  Mampaey,  C.  Verbeeck,  and  R.  de  Visscher,  The  spoca-suite:  a 
software  for  extraction  and  tracking  of  active  regions  and  coronal  holes  on  euv 
images,  arXiv  preprint  arXiv: 1208. 1483,  2012. 

Foukal,  P.  V.,  Solar  astrophysics,  Wiley-VCH,  2008. 

Gonzalez,  R.,  R.C.and  Woods,  and  S.  Eddins,  Digital  Image  Processing  Using 
MATLAB,  Gatesmark  Publishing,  United  States  of  America,  2009. 

Jewalikar,  V.,  and  S.  Singh,  Automated  sunspot  extraction,  analysis  and 
classification. 

Kiepenheuer,  K.,  Solar  activity,  The  Sun,  1,  322,  1953. 


127 


Lemen,  J.  R.,  The  atmospheric  imaging  assembly  (aia)  on  the  solar  dynamics 
observatory  (sdo),  Solar  Physics,  275(1-2),  17-40,  doi:10.1007/sll207-011-9776-8, 
2012. 

McIntosh,  P.  S.,  The  classification  of  sunspot  groups,  Solar  Physics ,  125(2), 
251-267,  1990. 

Meeus,  J.,  Astronomical  formulae  for  calculators,  Richmond,  Va.,  USA: 
Willmann-Bell,  cl 982.  2nd  ed.,  enl.  and  rev.,  1,  1982. 

Nguyen,  S.  H.,  T.  T.  Nguyen,  and  H.  S.  Nguyen,  Rough  set  approach  to  sunspot 
classification  problem,  in  Rough  Sets,  Fuzzy  Sets,  Data  Mining,  and  Granular 
Computing,  pp.  263-272,  Springer,  2005. 

Nguyen,  T.  T.,  C.  P.  Willis,  D.  J.  Paddon,  and  H.  S.  Nguyen,  A  hybrid  system  for 
learning  sunspot  recognition  and  classification,  in  Hybrid  Information  Technology, 
2006.  ICHIT’06.  International  Conference  on,  vol.  2,  pp.  257-264,  IEEE,  2006. 

Park,  D.-C.,  Sunspot  series  prediction  using  adaptively  trained  multiscale-bilinear 
recurrent  neural  network,  in  Computer  Systems  and  Applications  (AICCSA), 

2011  9th  IEEE/ ACS  International  Conference  on,  pp.  135-139,  IEEE,  2011. 

Pesncll,  W.  D.,  B.  Thompson,  and  P.  Chamberlin,  The  solar  dynamics  observatory 
(sdo),  Solar  Physics,  275(1-2),  3-15,  2012. 

Qahwaji,  R.,  and  T.  Colak,  Automatic  detection  and  verification  of  solar  features, 
International  Journal  of  Imaging  Systems  and  Technology,  75(4),  199-210,  2005. 

Qahwaji,  R.,  and  T.  Colak,  Neural  network-based  prediction  of  solar  activities, 
CITSA2006:  Orlando,  2006. 

Qahwaji,  R.,  and  T.  Colak,  Automatic  short-term  solar  flare  prediction  using 
machine  learning  and  sunspot  associations,  Solar  Physics,  2fl  (1),  195-211,  2007. 

Scherrer,  P.,  et  al.,  The  hclioseismic  and  magnetic  imager  (hmi)  investigation  for  the 
solar  dynamics  observatory  (sdo),  in  The  Solar  Dynamics  Observatory ,  pp. 
207-227,  Springer,  2012. 

Schou,  J.,  et  ah,  Design  and  ground  calibration  of  the  helioseismic  and  magnetic 
imager  (hmi)  instrument  on  the  solar  dynamics  observatory  (sdo),  in  The  Solar 
Dynamics  Observatory,  pp.  229-259,  Springer,  2012. 

Seidelmann,  P.  K.,  and  S.  Urban,  Explanatory  supplement  to  the  astronomical 
almanac,  in  Bulletin  of  the  American  Astronomical  Society,  vol.  42,  p.  522,  2010. 

Smart,  W.  M.,  Text-book  on  spherical  astronomy,  CUP  Archive,  1947. 


128 


Socas-Navarro,  H.,  Strategies  for  spectral  profile  inversion  using  artificial  neural 
networks,  The  Astrophysical  Journal ,  621  ( 1),  545,  2005. 

Thompson,  W.,  Coordinate  systems  for  solar  image  data,  Astronomy  and 
Astrophysics,  449  ( 2),  791-803,  2006. 

Turmon,  M.,  J.  Pap,  and  S.  Mukhtar,  Automatically  finding  solar  active  regions 
using  soho/mdi  photograms  and  magnetograms,  in  Structure  and  Dynamics  of  the 
Interior  of  the  Sun  and  Sun-like  Stars,  vol.  418,  p.  979,  1998. 

Turmon,  M.,  J.  Pap,  and  S.  Mukhtar,  Statistical  pattern  recognition  for  labeling 
solar  active  regions:  application  to  soho/mdi  imagery,  The  Astrophysical  Journal, 
568(1),  396,  2002. 

USAF,  Meteorological  codes,  AFMAN  15-124,  HQ  USAF/A30-W,  2013. 

Verbeeck,  C.,  P.  A.  Higgins,  T.  Colak,  F.  T.  Watson,  V.  Delouille,  B.  Mampaey,  and 
R.  Qahwaji,  A  multi- wavelength  analysis  of  active  regions  and  sunspots  by 
comparison  of  automatic  detection  algorithms,  Solar  Physics,  283 ( 1),  67-95,  2013. 

Watson,  F.,  and  L.  Fletcher,  Automated  sunspot  detection  and  the  evolution  of 
sunspot  magnetic  fields  during  solar  cycle  23,  Proceedings  of  the  International 
Astronomical  Union,  6( S273),  51-55,  2010. 

Watson,  F.  T.,  Investigating  sunspot  and  photospheric  magnetic  field  properties 
using  automated  solar  feature  detection,  Ph.D.  thesis,  University  of  Glasgow, 

2012. 

Wilson,  W.  H.,  Solar  ephemeris  algorithm,  SIO  Ref,  80,  13,  1980. 

Zharkov,  S.,  V.  Zharkova,  S.  Ipson,  and  A.  Benkhalil,  Technique  for  automated 
recognition  of  sunspots  on  full-disk  solar  images,  EURASIP  journal  on  applied 
signal  processing,  2005,  2573-2584,  2005. 


129 


Vita 


Second  Lieutenant  Gordon  Spahr  was  born  in  Oakland,  CA.  After  graduating 
from  Acalanes  ffigh  School  in  2008,  he  was  accepted  to  the  United  States  Air  Force 
Academy  as  part  of  the  Class  of  2012.  Graduating  with  a  Bachelor  of  Science 
Degree  in  Physics,  he  was  commissioned  into  the  United  States  Air  Force  from 
Cadet  Squadron  34. 

Lieutenant  Spahr’s  initial  assignment  upon  commissioning  was  to  the  Air  Force 
Institute  of  Technology  at  Wright  Patterson  AFB,  OH.  In  fall  of  2012,  he  entered 
the  Applied  Physics  program  to  obtain  a  Master’s  Degree,  concentrating  in  Solar 
Physics  and  Space  Weather.  LTpon  graduation,  Lieutenant  Spahr  will  be  assigned  to 
the  47th  Flying  Training  Wing  for  Specialized  LIndergraduate  Pilot  Training. 


130 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704-0188 


The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including 
suggestions  for  reducing  this  burden  to  Department  of  Defense,  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports  (0704—0188),  1215  Jefferson  Davis  Highway, 
Suite  1204,  Arlington,  VA  22202—4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  any  penalty  for  failing  to  comply  with  a  collection 
of  information  if  it  does  not  display  a  currently  valid  OMB  control  number.  PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


1.  REPORT  DATE  (DD-MM-YYYY)  2.  REPORT  TYPE 

27-03-2014  Master’s  Thesis 

4.  TITLE  AND  SUBTITLE 


3.  DATES  COVERED  (From  —  To) 
Aug  2012  —  Mar  2014 

I  5a.  CONTRACT  NUMBER 


Fully  Automated  Sunspot  Detection  and  Classification  Using  SDO  HMI 

Imagery  in  MATLAB 


I  5b.  GRANT  NUMBER 


I  5c.  PROGRAM  ELEMENT  NUMBER 


6.  AUTHOR(S) 


I  5 cl.  PROJECT  NUMBER 


Spahr,  Gordon  M.,  Second  Lieutenant,  USAF 


I  5e.  TASK  NUMBER 


I  5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Air  Force  Institute  of  Technology 

Graduate  School  of  Engineering  and  Management  (AFIT /EN) 
2950  Hobson  Way 
WPAFB  OH  45433-7765 

9.  SPONSORING  /  MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Capt.  Thomas  M.  Wittman 

Air  Force  Weather  Agency 

101  Nelson  Drive 

Offutt  AFB,  NE  68113 

DSN  271-0690,  COMM  402-294-0690 

Email:  2syosdor@offutt.af.mil 

12.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 


8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 


AFIT-ENP-14-M-34 


10.  SPONSOR/MONITOR’S  ACRONYM(S) 

AFWA 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 


DISTRIBUTION  STATEMENT  A:  APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 

13.  SUPPLEMENTARY  NOTES 

This  material  is  declared  a  work  of  the  U.S.  Government  and  is  not  subject  to  copyright  protection  in  the  United  States 

14.  ABSTRACT 

An  automatic  sunspot  detection  and  classification  method  is  developed  combining  HMII  and  HMIM  imagery  procured  from  the  Solar 
Dynamics  Observatory.  Iterative  global  thresholding  methods  are  employed  for  detecting  sunspots.  Groups  are  selected  based  on 
heliographic  distance  between  sunspots  via  area-based  grouping  lengths.  Classifications  are  applied  through  logical  operators  adhering  to  the 
standard  McIntosh  classification  system.  Calculated  sunspot  parameters  and  classifications  are  validated  in  three  way  comparisons  between 
code  output,  Holloman  AFB  and  the  Space  Weather  Prediction  Center.  Accuracy  is  achieved  within  the  margin  of  difference  between 
Holloman  and  SWPC  reports  for  sunspot  area,  number  of  groups,  number  of  spots,  and  McIntosh  classification  using  data  spanning  6  July 
2012  to  29  June  2013:  SWPC/Holloman  (33.38%, 57.48%, 87.67%),  SWPC/SDO  (20.22%, 51.25%, 83.80%),  and  SDO/Holloman 
(24. 54%, 50. 91%, 80. 65%).  The  automatic  classification  system  is  used  to  evaluate  bias  inherent  in  Holloman  classification  methods. 
Parameters  are  altered  to  reach  optimal  match  percentages  with  Holloman,  indicating  differences  between  computed  parameter  values  and 
hand-calculated  counterparts.  Group  length  cutoffs  are  shown  to  differ  by  2.5°,  eccentricity  is  quantified  at  0.8,  and  penumbra  length  cutoffs 
are  shown  to  exceed  differences  of  1.4°  from  McIntosh  values. 

15.  SUBJECT  TERMS 

Sunspots,  Automatic,  Detection,  McIntosh  Classification,  Space  Weather,  Bias  Evaluation,  Solar  Dynamics  Observatory, 
Iterative  Thresholding,  Image  Processing 


16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT 


b.  ABSTRACT!  c.  THIS  PAGE 


17.  LIMITATION  OF 
ABSTRACT 


18.  NUMBER 
OF 

PAGES 


19a.  NAME  OF  RESPONSIBLE  PERSON 

Dr.  Ariel  O.  Acebal,  AFIT/ENP 

19b.  TELEPHONE  NUMBER  (include  area  code) 

(937)  255-3636,  x4518;  ariel.acebal@afit.edu 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std.  Z39.18 


